Architectural Asymmetry and Orientation-Averaged Calibration for Joint Acoustic Echo Cancellation and Beamforming in Smart Glasses
Ariel Frank, Anat Tyomkin, Israel CohenModern hands-free and wearable communication devices employ multiple microphones and loudspeakers, leading to the joint presence of acoustic echo, background noise, and desired speech signals. While acoustic echo cancellation (AEC) and beamforming are commonly combined to address this challenge, existing architectures face a trade-off between computational complexity, stability, and adaptability. In particular, adaptive beamforming approaches require repeated estimation and inversion of covariance matrices, incurring high computational cost and introducing potential sensitivity to time-varying conditions. Conversely, fixed beamformers reduce online complexity and improve stability, but their performance can degrade when the acoustic scene differs from the calibration condition. In this work, we investigate low-complexity AEC–beamforming architectures that combine fixed minimum-variance distortionless response (MVDR) beamforming with adaptive AEC. Since the ordering of these stages yields two inequivalent architectures, we evaluate two configurations: AEC followed by beamforming (AEC-BF) and beamforming followed by AEC (BF-AEC). To reduce dependence on a single head pose in wearable devices, we use an offline orientation-averaged calibration strategy in which the undesired-signal covariance matrix and, when required, the relative echo transfer functions (RETFs) are estimated from calibration measurements averaged across multiple head orientations. The proposed methods are evaluated using real-device recordings from a six-microphone wearable device. The results show a clear architectural asymmetry: the fixed BF-AEC configuration achieves the highest average echo return loss enhancement (ERLE) and perceptual evaluation of speech quality (PESQ), with substantially lower online complexity than the fully adaptive baseline, whereas the fixed AEC-BF configuration provides a higher signal-to-distortion ratio (SDR) in the evaluated experiment. Additional calibration experiments show that orientation-averaged RETF calibration provides partial generalization across the measured head orientations, but also that the RETFs are not fully orientation-invariant. Overall, the results indicate that fixed BF-AEC provides a favorable trade-off between echo suppression, stability, and online complexity under the evaluated real-recording conditions.