DOI: 10.1115/1.4072208 ISSN: 1942-4302

Energy aware trajectory optimization using DRL in humanoid robot through via-point learning

James Sorokhaibam, Adersh Maruavttu, Ashish Dutta

Abstract

Learning whole-body motion for a 19-degrees-of-freedom (DoFs) humanoid robot using deep reinforcement learning (DRL) is challenging due to high kinematic redundancy, large joint-level search spaces, and the need to maintain dynamic balance. This paper proposes a structured DRL framework that integrates task-space via-point trajectory learning with a pseudo-inverse of Jacobian matrix for redundancy resolution (PJRR) method Instead of learning full joint trajectories, a Soft Actor–Critic (SAC) agent predicts a compact set of task-space via-points, which are converted into C2-continuous motions using composite cubic splines. Kinematic redundancy is resolved analytically through the PJRR, which operates separately from the learned policy and enforces a three-level task hierarchy: end-effector trajectory sub-task, hip trajectory sub-task, and posture regulation sub-task. The framework is validated on the KONDO KHR-3HV humanoid platform (1.3 kg, 19-DoFs) across four pick-and-place configurations, evaluated over N = 24 independent trials per motion. Energy-aware trajectories are generated with zero ZMP violations in simulation. A time-incentive reward formulation characterizes the energy–time trade-off, and a time-conditioned policy enables speed generalization at inference without retraining. Preliminary transferability to a scaled 23-DoFs platform is demonstrated in simulation.

More from our Archive