Abstract
Collaborative Machine Learning (CML) enables multiple user devices and a central server to train models together without moving raw data off-device. In such systems, training is distributed across devices and the server, which perform their respective computations and exchange model parameters or intermediate results. This thesis optimises system efficiency while preserving the standard CML privacy boundary.In practice, however, CML systems often have low resource utilisation due to several bottlenecks. First, limited computational capabilities on devices makes it difficult to support model training. Second, devices have heterogeneous computational capabilities, and slower ones delay overall training progress. Third, frequent data exchange between devices and the server introduces substantial communication overhead. Finally, reducing computational and communication costs often leads to lower model accuracy, especially when data distributions vary across devices.
Existing CML systems typically address only a subset of the aforementioned bottlenecks, and lack a unified solution that improves overall resource efficiency. This thesis proposes three systems that collectively addresses the abovementioned bottlenecks by reconstructing the training pipeline. The first system leverages pipeline parallelism to reduce idle time caused by the mutual waiting between devices and the server during training. Specifically, the model is split between the device and the server, and both sides process different inputs to enable concurrent execution. Second, an asynchronous training system is developed to allow devices to train models independently without waiting for the server or other devices, mitigating the impact of slow devices and further reducing idle time. Finally, a system is proposed to reduce the frequent exchanges of intermediate results between server and devices to one-shot transfer, thereby reducing communication overhead. The server model is trained on the consolidated intermediate results, which alleviates the impact of data heterogeneity and improves model accuracy.
The proposed systems demonstrate consistent improvements over state-of-the-art baselines on multiple models and datasets: lower device-side computation and idle time, reduced total communication, and higher test accuracy under heterogeneous data. Overall, the proposed pipeline reconstruction yields faster accuracy-over-time progress and provides a foundation for building more efficient and scalable systems in heterogeneous and resource-constrained environments.
| Date of Award | 3 Jul 2026 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Blesson Varghese (Supervisor) |
Keywords
- Collaborative machine learning
- Federated learning
- Split federated learning
- Communication cost
- On-device computation
- Data heterogeneity
- Pipeline reconstruction
Cite this
- Standard