DOI: 10.1111/exsy.70331 ISSN: 0266-4720

Small Language Models: A Systematic Review of Computational Trade‐Offs, Privacy Advantages and Deployment in Intelligent Systems

Sena Dikici, Turgay Tugay Bilgin

ABSTRACT

This systematic review synthesizes evidence from 68 studies, including peer‐reviewed journal articles, indexed conference/workshop proceedings and five remaining arXiv preprints published between 2022 and 2025, on small language models (SLMs) as computationally efficient alternatives to large language models (LLMs). Following PRISMA 2020 guidelines, we analyse optimization techniques, performance metrics, privacy–security trade‐offs and deployment strategies across application domains. Parameter‐efficient fine‐tuning (PEFT) methods such as Low‐Rank Adaptation (LoRA) yield 2–10 speedups while retaining 95%–99% of baseline accuracy and reducing trainable parameters by about 90%. Knowledge distillation and domain‐specific fine‐tuning deliver 1.5–3.5 speedups with 85%–95% task accuracy, while quantization reduces model size by 50%–75% and achieves 2–4 inference speedups with 85%–95% accuracy retention. Across benchmarks, fine‐tuned SLMs attain 85%–90% accuracy on domain‐specific tasks at only 10%–25% of LLM computational costs, though substantial gaps remain in complex reasoning (e.g., GPT‐4: 92.13% vs. CodeT5‐Large: 57.58%). Privacy analysis shows locally deployed SLMs achieve higher privacy and security scores, with healthcare, legal and IoT implementations often exceeding 9.0/10. Dataset analysis indicates that well‐curated 1–50 GB domain corpora can offset smaller model capacity, with data quality outweighing volume and imbalance mitigation yielding 10%–25% gains in low‐resource settings. Mobile and edge studies show LoRA plus quantization reduce latency from 500 to 50 ms, enabling 95% + efficiency gains for privacy‐preserving, resource‐constrained AI.

More from our Archive