TransLiteUNet: A Lightweight CNN–Transformer Hybrid for Efficient 3D Brain Tumor Segmentation with Sub-0.5 M Parameters
Lixin Zhou, Yuanyuan Yang, Yunfeng YangTransformer, with its unique self-attention mechanism, naturally excels in modeling global features. Convolutional Neural Networks (CNNs), on the other hand, leverage strong spatial inductive biases to effectively capture local features with fewer parameters. In 3D brain tumor segmentation, both local and global features are critical. Moreover, balancing high accuracy with computational cost in 3D segmentation models remains a key challenge. To address this, we propose TransLiteUNet, a lightweight 3D solution that combines CNN and Transformer architectures for accurate brain tumor segmentation without pretraining. To enhance parameter efficiency, we introduce a 3D axial depthwise separable convolution residual structure (3DRes-ADS block) and a lightweight LiteViT module, which improves global feature modeling at a lower computational cost. Specifically, TransLiteUNet (0.43 M parameters, 14.98 GFLOPs) and its simplified version, TransLiteUNet-S (0.31 M parameters, 7.68 G FLOPs), offer significantly lower model complexity compared to current state-of-the-art models. Tested on two publicly available datasets, our models outperform leading models under identical conditions. The parameter and computational costs are reduced by orders of magnitude, with optimized inference and training costs.