Robust Curriculum-Based SAC for End-to-End Motion Control of a 7-DOF Manipulator Under Sparse Rewards

doi:10.3390/electronics15132784

DOI: 10.3390/electronics15132784 ISSN: 2079-9292

Robust Curriculum-Based SAC for End-to-End Motion Control of a 7-DOF Manipulator Under Sparse Rewards

Yuhan Zhang, Jijun Gu

End-to-end motion control of 7-degree-of-freedom (DOF) redundant manipulators under sparse reward signals presents a fundamental challenge in deep reinforcement learning (DRL) for robotics: the vast configuration space and absence of dense gradient information combine to produce severe cold-start failures and high cross-seed training variance. This paper proposes Curriculum-SAC-HER, a novel fusion framework integrating Soft Actor–Critic (SAC), Hindsight Experience Replay (HER), and a performance-driven three-stage Automatic Curriculum Learning (ACL) scheduler, designed to resolve the cold-start exploration bottleneck within a training budget of 300,000 environment interaction steps. The core methodology progressively expands the spatial target distribution across three stages of increasing difficulty, conditioning each stage transition on an 80% rolling success threshold to guarantee kinematic prior consolidation before advancing. A rigorous evaluation across 15 independent training runs (five seeds per group, all retained without filtering) demonstrates that the proposed framework achieves a final mean success rate of 84.8% (std: 11.0%), substantially surpassing the SAC + HER ablation (70.3%, Mann–Whitney U test, p = 0.028) and the DDPG baseline (22.3%, p = 0.008), while compressing cross-seed variance by 67% relative to the ablation. Zero-shot robustness evaluations under simulated domain perturbations further reveal that the learned policy maintains above 92% success across extreme friction variations and sustains 71.8% success under a 1.5× payload increase, demonstrating that the ACL module fosters generalized kinematic representations rather than over-fitting to specific contact mechanics.

Outline

Robust Curriculum-Based SAC for End-to-End Motion Control of a 7-DOF Manipulator Under Sparse Rewards

More from our Archive