DOI: 10.1044/2026_jslhr-25-00295 ISSN: 1092-4388

Perception of Synthesized Mandarin Speech Based on a Large-Scale Language Model Among Deaf Adults With Cochlear Implants

Ju Zhang, Zhao Zhang, Yu Bai, Yu Chen

Purpose:

This study investigated the intelligibility and processing characteristics of Mandarin sentences synthesized based on a large-scale language model (LLM), compared to natural speech, specifically for deaf adults with cochlear implants (CIs).

Method:

Fifty participants, 25 Mandarin-speaking deaf adults with CIs and 25 Mandarin-speaking adults with normal hearing (NH), were enrolled in the study. Sentence recognition rates and reaction times were measured using sentences from the Mandarin speech perception database presented as six different speech types. These included original natural speech by a single female speaker at slow and normal speaking rates as well as synthetic speech generated by the LLM-based text-to-speech (TTS) model in two different voices (male and female) at two speaking rates (slow and normal). Linear mixed-effect models analyzed how sentence recognition accuracy and reaction time varied with listener group and speech type.

Results:

Participants with CIs demonstrated significantly lower sentence recognition rates and longer reaction times than NH participants for both natural and synthetic speech. Participants with CIs exhibited greater individual variability than NH listeners in terms of sentence recognition performance and reaction times. Changes in speaking rate significantly impacted the recognition rates of participants with CIs. Crucially, when controlling for speaking rate differences, there was no significant difference in sentence recognition rates or reaction times between natural and LLM-based synthetic speech for participants with CIs.

Conclusions:

The results suggest that LLM-based synthetic speech could achieve intelligibility levels indistinguishable from natural speech for CI recipients under comparable speaking rate conditions, similar to its performance for NH listeners. This study offers valuable insights for the future application of advanced TTS models in speech perception assessment and auditory rehabilitation for patients with CIs, including the potential development of personalized training materials.

Supplemental Material:

https://doi.org/10.23641/asha.32736135

More from our Archive