A Comprehensive Survey of Compression Algorithms for Language Models
Seungcheol Park, Jaehyeon Choi, Sojin Lee, U KangHow can we compress language models without sacrificing accuracy? The number of compression algorithms for language models is rapidly growing to leverage the remarkable advances of recent language models without side effects induced by their gigantic size, including increased carbon emissions, high latency, and restricted usage on resource-constrained mobile devices. While numerous compression algorithms have shown remarkable progress in compressing language models, it ironically becomes challenging to capture emerging trends and identify the fundamental concepts underlying them due to the excessive number of algorithms. In this paper, we survey and summarize diverse compression algorithms including pruning, quantization, knowledge distillation, low-rank approximation, and dynamic inference. We not only summarize the overall trend of diverse compression algorithms but also select representative algorithms and provide in-depth analyses of them. Finally, we introduce and discuss promising future research directions.