Attention-Based Multimodal Framework for Athlete-Performance Analysis and Rehabilitation Monitoring Using Vision and Wearable Sensors
Mohammed Alonazi, Iqra Aijaz Abro, Maha Abdelhaq, Raed Alsaqour, Ahmad Jalal, Hui LiuBackground: Advances in monitoring systems featuring wearable sensors, computer vision, and artificial intelligence (AI) have been increasingly used in sports science and rehabilitation practices as a means of movement pattern analysis, injury prevention, and training optimization. These technologies are becoming essential components of athlete-performance analysis and rehabilitation-monitoring systems designed to support biomechanical assessment, athlete development, and movement-quality evaluation. Athlete-performance analysis and rehabilitation monitoring increasingly rely on intelligent multimodal sensing systems capable of continuously evaluating movement quality, biomechanical patterns, training execution, and recovery progress. Human activity recognition (HAR) serves as a key enabling technology for these applications by providing automated assessment of human movement using wearable and vision-based sensing modalities. Therefore, the purpose of this study was to develop and evaluate an attention-based multimodal framework that integrates wearable inertial sensing and RGB video analysis for robust athlete-performance assessment and rehabilitation monitoring through accurate recognition of human movement patterns. Methods: Athlete-performance analysis and rehabilitation monitoring combining inertial sensor data and RGB-based visual information was introduced. Inertial signals were segmented with adaptive windowing, whereas silhouette refinement was performed to analyze motion structures from visual inputs in support of athlete-performance analysis and rehabilitation monitoring. Temporal, spatial, and motion features such as trajectory, orientation, and skeleton-based space-time representations were calculated from multimodal inputs. The proposed framework was designed to capture complex movement dynamics associated with rehabilitation exercises and sports-related motion patterns across heterogeneous sensing environments. Extracted features were then combined and optimized with a multimodal feature fusion approach, while the Ranger optimization algorithm was utilized during the process. An attention-based deep learning classifier was implemented to classify movement activities. Results: The results showed that the proposed framework reached accuracy scores of 88.40% and 87.96% on the VIDIMU dataset and the UTD-MHAD dataset respectively. Recognition performance across both inertial and vision-based modalities provided greater robustness than single-modality solutions. The integration of wearable sensing and computer vision modalities further improved the ability of the framework to analyze complex movement behaviors under varying execution conditions and environmental variations. Conclusion: The proposed multimodal framework provides a foundation for intelligent athlete-performance and rehabilitation-monitoring systems by integrating wearable sensing, computer vision, and attention-based artificial intelligence for robust movement analysis. The findings highlight its potential to support biomechanical assessment, movement-quality evaluation, training-performance monitoring, rehabilitation tracking, and injury-risk management in modern sports and healthcare environments.