Blueprint2Motion
: Two‐Stage Human–Object Manipulation Motion Synthesis via Limb–Object Blueprint
Lin Li, Zhen Liu, Tingting Liu, Xuyao Dai, YanJie Chai ABSTRACT
Generating human–object interaction (HOI) from text remains challenging because it must preserve both semantic alignment and human–object motion consistency. In practical animation synthesis and editing, partial motion observations are often available at the beginning and end of an interaction, making it more meaningful to synthesize the intermediate process under text guidance than to generate the entire sequence solely from text. We propose Blueprint2Motion, a two‐stage generative framework for HOI motion synthesis conditioned on text, object geometry, and spatiotemporal motion context. In the first stage, Text2Blueprint predicts an intermediate limb–object blueprint, including temporally coherent object motion and limb trajectories, from text, object geometry, and historical/future motion observations. In the second stage, Blueprint2Motion uses the generated blueprint as an explicit control signal to synthesize full‐body responsive human motion under the same text prompt. We further introduce an interaction loss to improve spatial alignment and motion coherence between key body joints and the manipulated object. Experiments on FullBodyManipulation and BEHAVE show improved overall motion quality and human–object coordination on most metrics, especially for manipulation‐oriented interactions. These results suggest that structured limb–object blueprints are an effective intermediate representation for text‐guided HOI synthesis under partial motion conditions.