Composite neural network pruning for edge computing

  • Bailey Jack Eccles

Student thesis: Doctoral Thesis (PhD)

Abstract

Deep neural networks (DNNs) are the foundation of modern machine learning applications, supporting various technologies from data analytics and chatbots to generative AI systems. However, DNNs require significantly more compute, memory, and energy resources than traditional non-AI workloads. As model architectures scale in size and complexity, most notably with the development of large language models (LLMs), the gap between computational demands and the capabilities of available deployment hardware continues to widen.

Simultaneously, there is a growing demand to move inference closer to data sources through edge computing, where models must run on devices with limited compute power, memory, and energy capacity. This creates a fundamental challenge: How can increasingly complex DNNs be deployed efficiently on resource-constrained hardware?

Model compression aims to reduce the size and computational cost of DNNs by removing redundant components while retaining those critical to model accuracy. Compressed models are faster, smaller, and more energy-efficient. However, existing compression methods are often slow, incur significant overheads, degrade accuracy, or fail to scale to modern models like LLMs. Furthermore, many techniques are optimised for cloud environments and are less suited for edge deployment.

This thesis identifies three key challenges in compressing DNNs for the edge: (1) maintaining high model accuracy after compression, (2) generating compressed models before training to save compute and memory resources, and (3) scaling compression techniques to large and modern architectures such as LLMs.

To address these, this thesis introduces a new method called composite pruning, which combines the advantages of unstructured pruning and structured pruning into a unified pruning framework. Composite pruning is used to build three pruning systems—each targeting a specific challenge and demonstrating practical improvements across a range of DNN architectures, including convolutional and transformer-based models. The result is a new pruning paradigm that enables the deployment of high-quality, compressed models in edge environments.
Date of Award2 Dec 2025
Original languageEnglish
Awarding Institution
  • University of St Andrews
SupervisorBlesson Varghese (Supervisor)

Keywords

  • DNN pruning
  • Composite pruning
  • Edge computing
  • Large language models
  • Machine learning
  • Model compression
  • Projection pruning

Access Status

  • Full text open

Cite this

'