Abstract
Extensive compute and memory requirements limit the deployment of large language models (LLMs) on any hardware. Compression methods, such as pruning, can reduce model size, which in turn reduces resource requirements. State-of-the-art pruning is based on coarse-grained methods. They are time-consuming and inherently remove critical model parameters, adversely impacting the quality of the pruned model. This paper introduces projection pruning, a novel fine-grained method for pruning LLMs. In addition, LLM projection pruning is enhanced by a new approach we refer to as composite projection pruning — the synergistic combination of unstructured pruning that retains accuracy and structured pruning that reduces model size. We develop Mosaic, a novel system to create and deploy pruned LLMs using composite projection pruning. Mosaic is evaluated using a range of performance and quality metrics on multiple hardware platforms, LLMs, and datasets. Mosaic is 7.19 faster in producing models than existing approaches. Mosaic models achieve up to 84.2% lower perplexity and 31.4% higher accuracy than models obtained from coarse-grained pruning. Up to 67% faster inference and 68% lower GPU memory use is noted for Mosaic models.
| Original language | English |
|---|---|
| Article number | 108056 |
| Pages (from-to) | 1-15 |
| Number of pages | 15 |
| Journal | Future Generation Computer Systems |
| Volume | 175 |
| Early online date | 11 Aug 2025 |
| DOIs | |
| Publication status | Published - Feb 2026 |
Keywords
- Composite projection planning
- Edge computing
- Model compression
- Large language models
- Model pruning
- Resource-efficient LLM
Fingerprint
Dive into the research topics of 'Mosaic: composite projection pruning for resource-efficient LLMs'. Together they form a unique fingerprint.Datasets
-
Mosaic (code)
Varghese, B. (Creator) & Eccles, B. J. (Creator), GitHub, 2025
https://github.com/blessonvar/Mosaic
Dataset: Software
Student theses
-
Composite neural network pruning for edge computing
Eccles, B. J. (Author), Varghese, B. (Supervisor), 2 Dec 2025Student thesis: Doctoral Thesis (PhD)
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver