A Re-Parameterized Lightweight Residual Attention Framework for Resource-Constrained Edge Computing
Yuze Gao, Jiamin Zhu, Xiaoxiao Liu, Wei WuEdge vision systems require convolutional neural networks (CNNs) that preserve recognition accuracy under strict storage, computation, and latency constraints. Although ResNet18 is a compact residual backbone, direct deployment on resource-constrained devices remains costly, whereas simple channel reduction weakens representation capacity. This study aims to build a deployable ResNet18-based classifier that reduces model complexity while recovering the accuracy lost during compression. We propose a lightweight framework that combines global channel scaling, a re-parameterized attention residual block, and teacher–student knowledge distillation. The proposed block uses multi-branch convolution and squeeze-and-excitation attention during training, then folds the linear branches into a single 3-by-3 convolution for inference. Experiments on CIFAR-100 show that the final model reduces parameters from 11.220 M to 2.841 M, retains comparable Top-1 accuracy (0.7579 vs. 0.7606), improves Top-5 accuracy (0.9340 vs. 0.9253), and reduces graphics processing unit (GPU) batch inference latency from 3.279 ms to 2.161 ms. Deployment on PYNQ-Z2 verifies the complete camera-based CPU-side inference workflow, with an average end-to-end latency of 421.467 ms/frame. The results indicate that residual topology preservation, re-parameterized feature enhancement, and distillation form a practical route for edge-oriented lightweight CNN deployment.