An integrative NLP framework identifies multilevel linguistic phenotypes of schizophrenia across tasks
Hironobu Nakamura, Yoshinobu Kano, Genichi Sugihara, Ryo Takemura, Yusei Yamaguchi, Masaaki Shimizu, Shunsuke Takagi, Mari Iizuka, Saaya Tashiro, Momoko Kitazawa, Ayako Sento, Hidehiko Takahashi, Kishimoto TaishiroAbstract
Background
Linguistic abnormalities in schizophrenia (SCZ) span morphological, syntactic, semantic, and discourse levels. Converging cross-linguistic evidence suggests that SCZ may involve semantic narrowing alongside reduced syntactic differentiation, yet how these changes co-occur across linguistic domains and whether they represent core, task-general disturbances remains unclear. We applied a multilevel NLP framework to a large Japanese dataset to identify structurally related linguistic markers of SCZ across elicitation contexts.
Methods
Speech from 104 patients with SCZ and 101 healthy controls was collected through semi-structured interviews. Transcripts from free conversation, storytelling, and picture description were analyzed using GiNZA, Word2Vec, TF-IDF, and SentenceBERT to extract 76 morphosyntactic, semantic, and discourse features. Factor analysis identified representative features independent of diagnosis, which were tested using generalized estimating equations and validated with bootstrap and permutation procedures. Cross-task stability was examined to determine core linguistic markers.
Results
In free conversation, reduced Case-particle (Kakujoshi) and Adverb use and increased Mean Pairwise Word Similarity were strongly associated with SCZ (AUC = 0.87, 95% CI: 0.74–0.97). Adverbial, case-particle, and semantic-network measures functioned as cross-task markers.
Conclusions
SCZ involves multidimensional language disturbances characterized by a tripartite linguistic phenotype of diminished morphosyntactic explicitness, semantic narrowing, and reduced modification-based contextual modulation in spontaneous discourse. Extending cross-linguistic evidence, our results indicate that lexical-semantic contraction co-occurs with reduced overt marking of argument relations in Japanese, alongside weakened adverbial elaboration and framing – suggesting convergent, largely task-general dimensions of SCZ language pathology, most evident in free conversation.