Projects per year
Abstract
This paper presents a novel design and implementation of k-means clustering algorithm targeting the Sunway TaihuLight supercomputer. We introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension. Our multi-level (nkd) approach unlocks the potential of the hierarchical parallelism in the SW26010 heterogeneous many-core processor and the system architecture of the supercomputer.
Our design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability, significantly improving the capability of k-means over previous approaches. The evaluation shows our implementation achieves performance of less than 18 seconds per iteration for a large-scale clustering case with 196,608 data dimensions and 2,000 centroids by applying 4,096 nodes (1,064,496 cores) in parallel, making k-means a more feasible solution for complex scenarios.
Our design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability, significantly improving the capability of k-means over previous approaches. The evaluation shows our implementation achieves performance of less than 18 seconds per iteration for a large-scale clustering case with 196,608 data dimensions and 2,000 centroids by applying 4,096 nodes (1,064,496 cores) in parallel, making k-means a more feasible solution for complex scenarios.
Original language | English |
---|---|
Title of host publication | Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18) |
Place of Publication | Piscataway |
Publisher | IEEE Press |
Chapter | 13 |
Number of pages | 11 |
ISBN (Electronic) | 9781538683842 |
DOIs | |
Publication status | Published - 11 Nov 2018 |
Event | The International Conference for High Performance Computing, Networking, Storage, and Analysis - Dallas, United States Duration: 11 Nov 2018 → 16 Nov 2018 https://sc18.supercomputing.org/ |
Conference
Conference | The International Conference for High Performance Computing, Networking, Storage, and Analysis |
---|---|
Abbreviated title | SC18 |
Country/Territory | United States |
City | Dallas |
Period | 11/11/18 → 16/11/18 |
Internet address |
Keywords
- Supercomputer
- Multi/many-core Processors
- Clustering
- Parallel computing
Fingerprint
Dive into the research topics of 'Large-scale hierarchical k-means for heterogeneous many-core supercomputers'. Together they form a unique fingerprint.Projects
- 3 Finished
-
ABC: Adaptive Brokerage for the Cloud: ABC: Adaptive Brokerage for the Cloud
Barker, A. D. (PI) & Thomson, J. D. (CoI)
1/04/18 → 30/09/22
Project: Standard
-
-
Discovery: Pattern Discovery and Program: Discovery: Pattern Discovery and Program Shaping for Manycore Systems
Thomson, J. D. (PI), Hammond, K. (CoI) & Sarkar, S. (CoI)
1/07/17 → 31/12/20
Project: Standard