DOI: 10.1177/10731911261455483 ISSN: 1073-1911

Modeling Individual Language Patterns and Psychological Constructs to Generate AI-Augmented Data for Scalable Psychological Assessment

Pengda Wang, Hanjie Chen, Frederick L. Oswald, Tianjun Sun

Scientific and systematic data collection and analysis have long been a crucial foundation in psychological assessment systems. It is only through this process that psychology professionals can effectively measure and interpret individuals’ mental states, behavioral patterns, and standing on underlying latent constructs. However, obtaining high-quality task-specific data remains challenging due to issues of cost, time, and scalability, all further complicated by ethical and privacy concerns associated with sensitive psychological information. To address this, we apply alignment training of large language models (LLMs) to generate artificial intelligence (AI)-augmented data. This method uses existing participant responses to create personalized, plausible answers to new or unanswered questions. The augmented data match individuals’ linguistic style and psychological characteristics, thereby simulating plausible personalized responses. We evaluated this method using an archival dataset of life-narrative interviews originally collected for personality trait prediction. We compared the augmented data with the real data, at both the linguistic levels (i.e., via the perplexity metric and the multidimensional tagger) and utility levels (i.e., similar functions, such as personality trait prediction). Finding that AI-generated data closely resemble human data and can therefore support pilot testing or modeling missing responses. Overall, the augmented data approach offers a scalable, effective solution to enriching datasets in AI-based psychological assessments.

More from our Archive