Projects per year
Abstract
This article presents an automatic k-means clustering solution targeting the Sunway TaihuLight supercomputer. We first introduce a multilevel parallel partition approach that not only partitions by dataflow and centroid, but also by dimension, which unlocks the potential of the hierarchical parallelism in the heterogeneous many-core processor and the system architecture of the supercomputer. The parallel design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability. Furthermore, we propose an automatic hyper-parameter determination process for k-means clustering, by automatically generating and executing the clustering tasks with a set of candidate hyper-parameter, and then determining the optimal hyper-parameter using a proposed evaluation method. The proposed auto-clustering solution can not only achieve high performance and scalability for problems with massive high-dimensional data, but also support clustering without sufficient prior knowledge for the number of targeted clusters, which can potentially increase the scope of k-means algorithm to new application areas.
Original language | English |
---|---|
Pages (from-to) | 997-1008 |
Number of pages | 12 |
Journal | IEEE Transactions on Parallel and Distributed Systems |
Volume | 31 |
Issue number | 5 |
Early online date | 27 Nov 2019 |
DOIs | |
Publication status | Published - May 2020 |
Keywords
- Supercomputer
- Heterogeneous many-core processor
- Data partitioning
- Clustering
- Scheduling
- AutoML
Fingerprint
Dive into the research topics of 'Large-scale automatic k-means clustering for heterogeneous many-core supercomputer'. Together they form a unique fingerprint.Projects
- 2 Finished
-
ABC: Adaptive Brokerage for the Cloud: ABC: Adaptive Brokerage for the Cloud
Barker, A. D. (PI) & Thomson, J. D. (CoI)
1/04/18 → 30/09/22
Project: Standard
-
Discovery: Pattern Discovery and Program: Discovery: Pattern Discovery and Program Shaping for Manycore Systems
Thomson, J. D. (PI), Hammond, K. (CoI) & Sarkar, S. (CoI)
1/07/17 → 31/12/20
Project: Standard