RPT-Mamba: A Range-Aware Physical Token Mamba Network for Far-Field mmWave Radar Gesture Recognition
Yitong Shi, Pei Peng, Zhiyuan WangMillimeter-wave (mmWave) radar provides a privacy-preserving and illumination-robust sensing modality for contactless gesture recognition. However, sparse radar point clouds degrade substantially as sensing distance increases: the number of valid detections decreases, echo intensity attenuates, and Doppler-related motion cues become less reliable. Such range-induced degradation leads to a distribution shift between near-range training samples and far-field test samples, making it difficult for models trained at short distances to generalize to unseen longer distances. Existing point-cloud gesture recognition methods usually treat radar detections as generic sparse point sequences and rarely model distance-related point loss, echo attenuation, and physical-attribute unreliability explicitly. This work introduces RPT-Mamba, a range-aware physical token Mamba network for sparse mmWave radar point cloud sequences. RPT-Mamba constructs physical point tokens from spatial coordinates, Doppler velocity, echo intensity, point-level range, and sample-level range information. During training, a range-aware stochastic degradation strategy adaptively removes points and masks dynamic attributes according to the estimated sensing distance, while a context-guided attribute reconstruction objective recovers masked Doppler and intensity attributes from spatial and frame-level context. A bidirectional Mamba temporal encoder then models long-range gesture dynamics over frame tokens. On the public mTransSee dataset, RPT-Mamba achieves 92.09% accuracy and 92.04% Macro-F1 under the random split protocol, and 85.34% accuracy and 84.77% Macro-F1 under a challenging near-to-far protocol, exceeding point-cloud, radar-gesture, Transformer, and Mamba baselines.