DOI: 10.12688/openreseurope.24356.1 ISSN: 2732-5121

A lightweight computational method for monitoring response convergence in LLM-based synthetic populations

Rodrigo Alvarez, Mina Soroberto, Fernando De la Prieta, Mario Cartón
Background LLM-based synthetic populations are increasingly used to simulate open-ended survey, interview, user-research, and group-style responses. These workflows can support early exploratory research, but they also create a methodological risk: agents configured as different participants may nevertheless converge towards similar wording, examples, or answer templates. Without an explicit monitoring step, apparent agreement can be difficult to distinguish from model-level homogenization. Methods We present a lightweight computational method for monitoring lexical response convergence in generated agent populations. The method uses a platform-independent response-record schema, deterministic normalization, pairwise lexical overlap, vocabulary-diversity diagnostics, data-quality counters, threshold-based inspection flags, and reproducible JSON reporting. The reference implementation is the Python package synthetic-response-metrics version 0.2.5, which supports CSV, JSON, JSONL, API, and command-line execution. Worked examples and initial validation We provide initial validation using controlled synthetic examples created for this article. The validation includes a data-quality example, low-, moderate-, and high-convergence scenarios, a population-size sensitivity analysis, and a high-convergence diagnostic pair. These examples show expected metric behaviour, including increasing overlap under controlled convergence, changing pair counts as population size increases, and traceability of alerts through top similar pairs. In addition, an applied example generated from a synthetic-population platform uses 10 synthetic agents and three open-ended questions to check operational portability of the same input schema. Conclusions The method provides a reproducible lexical baseline for observing whether generated agents are producing overly similar answers. It does not replace semantic evaluation, human baselines, or domain validation, and the default threshold is provisional. However, it offers a practical monitoring layer that can be embedded in synthetic-population studies before more expensive or domain-specific validation is available.

More from our Archive