Potential and biases of large language model simulation for public surveys on Alzheimer's disease therapies

doi:10.1177/13872877261464182

DOI: 10.1177/13872877261464182 ISSN: 1387-2877

Potential and biases of large language model simulation for public surveys on Alzheimer's disease therapies

Kenichiro Sato, Yoshiki Niimi, Ryoko Ihara, Kazushi Suzuki, Atsushi Iwata, Takeshi Iwatsubo

Background

While large language models (LLMs) have a potential to simulate public-opinion, their reliability for sensitive medical topics like novel Alzheimer's disease (AD) treatments remains unclear.

Objective

This study compared LLM-generated and human answers on AD-therapy dilemmas; assessed model and prompting parameter influences; and identified demographic bias.

Methods

Using survey data on late 2023 from 1671 Japanese Trial Ready Cohort Webstudy participants who are presumably cognitively unimpaired, LLM persona profiles guided four LLMs (Gemini-1.5-flash, Gemini-2.0-flash, GPT-4.1-mini, GPT-4o-mini). The models answered a binary question about acceptance towards patient-prioritization or a 5-point Likert question on concern about amyloid-related imaging abnormalities (ARIA) under varied prompt settings. Aggregate similarity was measured with Jensen-Shannon Divergence (JSD) for binary and Earth Mover's Distance (EMD) for Likert scale; while individual agreement used Cohen's κ and Spearman's ρ.

Results

While some LLM models achieved fair group-level agreement in both tasks (JSD ≤ 0.05, EMD < 1.0), individual agreement was negligible across any LLM settings (κ, ρ ≈ 0). Adding detailed attributes like living condition, clinical status, or related personal opinions offered limited improvement. Performance was largely stable for most demographic levels, but deteriorated for minority subgroups, such as those with low education or requiring long-term care.

Conclusions

Our study demonstrates that current LLMs can approximate aggregate attitudes toward novel AD therapies but cannot predict individual opinions. They can amplify biases in some small subgroups. LLMs may be useful for pre-testing public survey in the field of AD/dementia treatment but should not replace authentic human data.

Outline

Potential and biases of large language model simulation for public surveys on Alzheimer's disease therapies

Background

Objective

Methods

Results

Conclusions

More from our Archive