Large-scale hierarchical k-means for heterogeneous many-core supercomputers

Lideng Li, Teng Yu, Wenlai Zhao, Haohuan Fu, Chenyu Wang, Li Tan, Guangwen Yang, John Thomson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)
4 Downloads (Pure)

Abstract

This paper presents a novel design and implementation of k-means clustering algorithm targeting the Sunway TaihuLight supercomputer. We introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension. Our multi-level (nkd) approach unlocks the potential of the hierarchical parallelism in the SW26010 heterogeneous many-core processor and the system architecture of the supercomputer.

Our design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability, significantly improving the capability of k-means over previous approaches. The evaluation shows our implementation achieves performance of less than 18 seconds per iteration for a large-scale clustering case with 196,608 data dimensions and 2,000 centroids by applying 4,096 nodes (1,064,496 cores) in parallel, making k-means a more feasible solution for complex scenarios.
Original languageEnglish
Title of host publicationProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18)
Place of PublicationPiscataway
PublisherIEEE Press
Chapter13
Number of pages11
ISBN (Electronic)9781538683842
DOIs
Publication statusPublished - 11 Nov 2018
EventThe International Conference for High Performance Computing, Networking, Storage, and Analysis - Dallas, United States
Duration: 11 Nov 201816 Nov 2018
https://sc18.supercomputing.org/

Conference

ConferenceThe International Conference for High Performance Computing, Networking, Storage, and Analysis
Abbreviated titleSC18
Country/TerritoryUnited States
CityDallas
Period11/11/1816/11/18
Internet address

Keywords

  • Supercomputer
  • Multi/many-core Processors
  • Clustering
  • Parallel computing

Fingerprint

Dive into the research topics of 'Large-scale hierarchical k-means for heterogeneous many-core supercomputers'. Together they form a unique fingerprint.

Cite this