Automated Speech-Based Modeling of Item-Level Symptom Severity in Schizophrenia
Silvia Ciampelli, Janna N. de Boer, Sanne Koops, Evan Troelstra, Almut Jebens, Jan-Bernard C. Marsman, Arnout C. Smit, Amir Hossein Nikzad, Ryan Partlan, Philipp Homan, Wolfram Hinzen, Sunny X. Tang, Iris E. C. SommerImportance
Speech carries subtle indicators of current mental state, a phenomenon routinely used in psychiatric assessment. Despite its potential, quantitative speech analysis is not yet integrated into clinical care.
Objective
To identify which features of naturalistic speech are associated with concurrent variation in symptom severity in patients with psychotic disorders.
Design, Setting, and Participants
This longitudinal, multicenter cohort study was conducted in the Netherlands (June 7, 2017, to July 31, 2025) in adult patients (aged ≥18 years) with schizophrenia spectrum disorders who completed repeated clinical and speech assessments for up to 8 years. Findings were replicated in a longitudinal cohort from the US (March 1, 2021, to December 1, 2022).
Main Outcomes and Measures
Individual symptom severity was measured using the Positive and Negative Syndrome Scale (PANSS) (Dutch cohort) or the Brief Psychiatric Rating Scale (US cohort). Speech samples were converted into interpretable artificial intelligence–derived voice and text features, reduced using principal component analysis. Associations were estimated using linear mixed models adjusted for demographic characteristics and time point. Estimative accuracy was quantified using mean absolute error (MAE) and
Results
In the Dutch cohort, 773 speech recordings from 356 participants (mean [SD] age, 30.4 [10.3] years; 257 male [72.2%]) were analyzed, and in the US cohort, 165 speech recordings from 72 participants (mean [SD] age, 26.4 [5.2] years; 46 male [64.8%]) were analyzed. In the Dutch cohort, speech-based models detected individual psychotic symptoms with clinically meaningful accuracy, with an item-level MAE of less than 1 point (scale, 1-7), comparable to the 1-point agreement margin used in PANSS rater training. Models were associated with PANSS positive and negative subscale scores, with MAE of 2.85 and 3.22, explaining 13.4% and 17.8% of the variance, respectively, and well below the 20% threshold used to flag unreliable PANSS ratings. In the US replication cohort, Brief Psychiatric Rating Scale thought disturbance and withdrawal scores were associated with similar MAEs (3.0 and 2.4, respectively), explaining 31.0% and 21.5% of the variance. In the Dutch cohort, negative symptoms were associated with more reduced speech output (estimate, 2.46 [95% CI, 1.05-3.88]) and flatter acoustic profiles (estimate, 0.29 [95% CI, 0.10-0.48]), whereas positive symptoms were marked by longer utterances (estimate, 1.18 [95% CI, 0.08-2.29]) and altered discourse organization (estimate, −0.25 [95% CI, −0.48 to −0.02]).
Conclusions and Relevance
This cohort study of individuals with schizophrenia found that psychotic symptoms left specific, interpretable signatures in naturalistic speech that could be quantified, tracked longitudinally, and replicated across cohorts. Speech-based modeling achieved clinically meaningful, symptom-level estimations, providing a solid basis for scalable, low-burden tools for real-time monitoring in psychosis.