DOI: 10.1121/10.0044192 ISSN: 1520-8524

Vocal-tract length estimation from vowel formants benchmarked against acoustic pharyngometry

Daniel Friedrichs, Urs Guerrini, Axel Ekström, Volker Dellwo, Steven Moran

Estimating vocal-tract length (VTL) from vowel formants can aid speaker normalization, but few methods have been benchmarked against an anatomical reference in the same speakers. We combined acoustic pharyngometry (APh) and speech data from 42 adults to benchmark eight widely used formant-based VTL estimators against incisors-to-glottis length and to test an interpretable two-stage bias-corrected linear estimator. Across more than 400 000 central frames with valid F1–F4, traditional quarter-wave, odd-harmonic, and dispersion-type estimators correlated with VTLAPh but showed poor out-of-sample anatomical recovery and strong calibration compression. Re-estimated one-stage linear models reduced mean absolute error (MAE; median ≈1.0 cm) but still overestimated shorter tracts and underestimated longer tracts. A two-stage model markedly improved calibration and agreement, outperforming one-stage linear and nonlinear alternatives (median per-vowel MAE 0.39 cm, median out-of-sample R2=0.83). Front and front-rounded vowels were the most informative. Speaker-level 95% limits of agreement were about ±0.9 cm, indicating that the method is better suited to aggregated tract-scale estimation than to direct anatomical measurement. These results identify calibration bias as a central limitation of standard formant-based VTL estimators and provide a practical, interpretable route to tract-scale estimation from similarly processed labeled-vowel data under matched conditions.

More from our Archive