DOI: 10.1002/alz.075390 ISSN: 1552-5260

Building a neuropsychiatric testing database for veterans with mild cognitive impairment and Alzheimer’s disease with unstructured electronic health records

Xuyang Li, Jinying Chen, Byron J. Aguilar, Ekaterina Shishova, Peter J Morin, Dan Berlowitz, Donald R Miller, Maureen K. O'Connor, Andrew Huy Nguyen, Raymond Zhang, Amir Abbas Tahami Monfared, Quanwu Zhang, Weiming Xia
  • Psychiatry and Mental health
  • Cellular and Molecular Neuroscience
  • Geriatrics and Gerontology
  • Neurology (clinical)
  • Developmental Neuroscience
  • Health Policy
  • Epidemiology



Information on clinical decision making and disease severity assessments for mild cognitive impairment (MCI) or Alzheimer’s dementia (AD) typically only exists in medical notes. In this study, we present the methodology of constructing a database from unstructured electronic health records (EHR), which enables enriched population studies when combined with structured administrative databases.


A multidisciplinary team with expertise in neurology, epidemiology, biology, and health informatics designed and created the relational database. The database was established with three tables for patient demographics, neurocognitive and neuropsychiatric test results, and clinical judgment of AD severity. Patient cohorts were identified by searching keywords ("Alz*” or “Mild Cognitive Impairment”) from electronic clinical notes and excluding false positives using expert‐designed rules. The most frequently used six neuropsychiatric tests (MMSE, MoCA, SLUMS, Mini‐cog, BNT, BVRT) and their corresponding testing results were extracted in addition to clinicians’ judgment on disease severity. The test scores were extracted using a rule‐based natural language processing (NLP) system.


A patient cohort with MCI (N = 74,444) or probable AD (N = 141,816) was identified in the VA database for the fiscal year of 2019 (Table 1), with 18,173 patients having both MCI and AD diagnosis among their clinical notes of the year. A total of 1,529,897 neuropsychiatric testing scores were extracted from the patients’ clinical notes across all years, with 388,253 MMSE scores, 614,400 MoCA scores, 462,646 SLUMS scores, 18,868 Mini‐cog scores, 45,259 BNT scores, and 471 BVRT scores. A total of 57,879 (77.75%) MCI patients and 82,656 (58.28%) AD patients had at least one test score extracted. A total of 18,939 (13.35%) AD patients had at least one documented severity categorization made by a clinician.


We established a database of neuropsychiatric testing scores for patients with MCI or AD based on electronic medical notes from the VA healthcare system, aided by NLP tools. Our approach demonstrated a scalable pipeline to integrate with structured EHR in support of enriched population level analysis in AD.

More from our Archive