A deep learning framework for evaluating dynamic sports movements via video-based motion analysis for digital health

Rezaee, Khosro; Monshizadeh, Fatemeh

doi:10.30476/jhmi.2026.109616.1334

A deep learning framework for evaluating dynamic sports movements via video-based motion analysis for digital health

Articles in Press

Document Type : Original Article

Authors

Khosro Rezaee ¹
Fatemeh Monshizadeh ²

¹ Department of Biomedical Engineering, Meybod University, Meybod, Iran

² Meybod University

10.30476/jhmi.2026.109616.1334

Abstract

Background: Video-based assessment of exercise technique can support coaching, injury prevention, and remote rehabilitation, yet many methods stop at action recognition or require laboratory motion capture.
Objective: To develop and evaluate a deep-learning framework that classifies execution quality as correct vs. incorrect form from instructional videos.
Methods: We compiled 270 YouTube coaching videos spanning ten exercises (~20,500 labeled frames after removing ~15% non-movement content). Clip-level technique labels (correct/incorrect) were assigned based on coaching/biomechanical criteria and propagated to sampled frames. A markerless pose-estimation model (name/version reported) produced 2D keypoints and kinematic descriptors (e.g., joint angles and velocities). These signals were encoded with a ResNet-50 backbone, and a multi-head attention Transformer modeled temporal dependencies. Training used a video-stratified 80/20 split, with 5-fold cross-validation on the training portion only for model selection; class-weighted loss mitigated imbalance, and evaluation reports per-class precision/recall/F1 and confusion matrices. We also define a biomechanical complexity index (0–1) combining joints engaged, angular sensitivity, range of motion, and balance demand to relate movement difficulty to performance.
Results: The full CNN–attention–Transformer achieved 97.56% accuracy and F1=0.956 on the held-out test set. Per-exercise performance remained high for low-to-intermediate complexity movements, while the most challenging exercises showed reduced F1—for example, deadlift and crunch exhibited the lowest scores (≈0.88–0.90), yielding an overall F1 range of 0.875–0.945 across exercises.
Conclusions: This framework enables multi-exercise, video-based technique assessment (correct vs. incorrect) from consumer footage. Attention-weight patterns and error-case analyses provide practical insight into model decisions for intelligent coaching and tele-rehabilitation.

Keywords

Main Subjects

Artificial Intelligence in medicine

Articles in Press, Accepted Manuscript
Available Online from 01 April 2026

Article View: 65

A deep learning framework for evaluating dynamic sports movements via video-based motion analysis for digital health

Articles in Press, Accepted Manuscript
Available Online from 01 April 2026

Files

Share

How to cite

Statistics

A deep learning framework for evaluating dynamic sports movements via video-based motion analysis for digital health

Articles in Press, Accepted Manuscript Available Online from 01 April 2026

Files

Share

How to cite

Statistics

Articles in Press, Accepted Manuscript
Available Online from 01 April 2026