DOI: 10.3390/bioengineering13070748 ISSN: 2306-5354

Multidimensional Prosodic and Semantic Coherence Modeling for Mandarin Mild Cognitive Impairment Detection

Rongyu Li, Meihong Wu

Early detection of Alzheimer’s disease (AD) and mild cognitive impairment (MCI) remains critically important, yet conventional neuroimaging and biomarker-based approaches are costly, invasive, and poorly scalable for population screening. Speech offers a non-invasive, cost-effective alternative cognitive biomarker, but existing systems rarely integrate its multiple linguistic dimensions. We present Multi-Spec MCI-Net, a multimodal framework for HC/MCI classification that jointly models three complementary speech representations: token-level semantics via dVAE and BERT operating on Mel spectrograms; temporal prosodic dynamics via a 1D-CNN with attention; and discourse-level semantic coherence via a graph convolutional network. A gated fusion mechanism adaptively weights these modalities, yielding clinically interpretable predictions tailored to individual phenotypic profiles. Evaluated on the Chinese NCMMSC2021_AD challenge dataset and the DementiaBank Mandarin subset, the model achieves 89.29% accuracy and 0.9584 ROC AUC on NCMMSC2021_AD, with 92.31% MCI recall—critical for minimizing false negatives in screening contexts. Evaluation on the combined NCMMSC2021_AD and DementiaBank Mandarin dataset attains 77.46% accuracy and 0.8280 AUC, demonstrating robustness across spontaneous dialog and picture description tasks. Ablation studies confirm that multimodal fusion outperforms the semantic-only baseline by 5.16 percentage points, with each branch contributing non-redundant diagnostic information. These results establish an effective, interpretable approach for scalable, speech-based early MCI screening.

More from our Archive