ST‐LoRA: SVD‐Guided Sparse Low‐Rank Adaptation With Trainable Masks for Large Language and Vision Models

doi:10.1049/cvi2.70072

DOI: 10.1049/cvi2.70072 ISSN: 1751-9632

ST‐LoRA: SVD‐Guided Sparse Low‐Rank Adaptation With Trainable Masks for Large Language and Vision Models

Quan Gan, Haifeng Hu, Feifei Wang, Ming Zhao, Xiaoyan Ren

ABSTRACT

The rapid advancement of large‐scale pre‐trained models in computer vision and natural language processing has necessitated the development of parameter‐efficient fine‐tuning (PEFT) techniques. Low‐Rank Adaptation (LoRA) has emerged as a leading approach, enabling efficient adaptation by decomposing weight updates into low‐rank matrices. However, standard LoRA suffers from inefficiencies due to uniform rank allocation, redundant parameter updates, and random initialisation. This paper introduces Sparse Truncated Low‐Rank Adaptation (ST‐LoRA), a novel PEFT method that enhances LoRA through three key innovations: (1) SVD‐based initialisation, leveraging singular value decomposition of pre‐trained weights to provide informed starting points for adaptation matrices. (2) Dynamic rank assignment, allocating layer‐specific ranks based on singular value distributions to optimise parameter efficiency. (3) Learnable sparse masks, selectively updating only the most critical connections to reduce computational overhead. We evaluate ST‐LoRA across vision (ViT‐L/16 on ImageNet‐1K, CIFAR‐100, and fine‐grained datasets) and language (LLaMA‐2 7B on GLUE and instruction‐following tasks) domains. Our comprehensive experiments include rigorous ablation studies, cross‐validation, and statistical significance testing. Our results demonstrate that ST‐LoRA consistently outperforms existing PEFT methods while significantly reducing computational costs. On ImageNet‐1K, ST‐LoRA achieves 84.0% top‐1 accuracy ( 0.12% std), surpassing LoRA ( r = 16) by 0.4% while using only 26.5% of its parameters and reducing training time by 57%. For language tasks, ST‐LoRA approaches full fine‐tuning performance with a fraction of the computational cost, reducing training time by up to 45% versus standard LoRA and 87% versus full fine‐tuning.

Outline

ST‐LoRA: SVD‐Guided Sparse Low‐Rank Adaptation With Trainable Masks for Large Language and Vision Models

ABSTRACT

More from our Archive