DOI: 10.1002/alz.079800 ISSN: 1552-5260

Automated lexical analysis of story recall in healthy controls

Martin Ho Kwan Ip, Katheryn A Q Cousins, Naomi Nevler, Mark Y Liberman, Murray Grossman, Sunghye Cho
  • Psychiatry and Mental health
  • Cellular and Molecular Neuroscience
  • Geriatrics and Gerontology
  • Neurology (clinical)
  • Developmental Neuroscience
  • Health Policy
  • Epidemiology



Impaired episodic memory is one of the earliest and most prominent symptoms of Alzheimer’s disease (AD). Examining how patients produce words and phrases in story recall tasks is useful for tracking and understanding progression of their disease. Traditional approaches rely on manual assessments of recalled stories which can be limited, time‐consuming, and not applicable in large‐scale studies. Here, we implement an automated, computational approach to measure verbal episodic memory in a digitally recorded story recall task performed by young healthy speakers using natural language processing models.


We analyzed digitized speech samples of pre‐recorded immediate and delayed Craft Story recall tasks performed by healthy speakers (n = 67, mean age = 20.31 years, SD = 2.17, 45 (67%) females). All transcripts were transcribed by trained annotators and automatically scored for number of verbatim and paraphrase recall, total recall score ( = verbatim+paraphrase), and semantic distance from the original story (degree of semantic similarity between the original craft story and the recalled story based on the Word2vec module in python). We also calculated total word count and number of unique words, and rated all recalled words for word familiarity, concreteness, and semantic ambiguity based on published norms. Number of pauses and total durations of speech and silent pauses were also measured.


We found larger semantic distance between successive words in the delayed recall task compared with immediate recall within speaker (p<.001). Delayed recall also elicited lower total recall scores than immediate recall (p = .03). Comparing the effect sizes of semantic distance vs. total recall scores, our model comparison showed a smaller prediction error for semantic distance (AIC = 95.45) than total recall score (AIC = 736.18). In addition, speakers in the delayed recall task performed more paraphrase recall, had higher total word count, number of unique words, more ambiguous words, and produced more speech (all p‐values<.001).


Our findings suggest that automated speech analysis can provide informative and cost‐effective measures from a recorded story recall task. The study provides an automated and quantitative way to better measure memory abilities by including semantic distance. Future work will test procedures in patients with AD or other neurodegenerative diseases.

More from our Archive