Computer Vision Models for Human Activity Recognition: A Literature Review
Luís Henrique Travassos, Edite Ravella, Jorge Bernardino, Francisco B. PereiraHuman Activity Recognition (HAR) is the automated process of identifying human actions using sensor data or video, which is widely used in healthcare, smart environments, and surveillance. Although HAR based on computer vision has advanced rapidly, existing reviews do not adequately address the recent shift toward hybrid deep-learning architectures or provide a structured comparison of the trade-offs relevant to real-world deployment. This literature review addresses that gap through a PRISMA-guided analysis of articles published between 2021 and 2025 and retrieved from four major databases. The review develops a reproducible taxonomy of nine architectural families and applies a multidimensional evaluation framework covering classification accuracy, computational efficiency for edge deployment, environmental generalization, and fine-grained activity recognition. The findings show that hybrid architectures are the dominant design strategy, while attention-based and graph-based models play important specialized roles depending on temporal complexity, privacy requirements, and deployment constraints, with the literature concentrated mainly in healthcare and security applications.