SOCbot: Using Large Language Models to Dynamically Measure and Classify Occupations in Surveys
Patrick Sturgis, Thomas S. Robinson, Laura Fung, Caroline RobertsWe present the results of a new approach to measuring the occupations of respondents in surveys using Large Language Models (LLMs). In our new approach, which we call SOCbot, an LLM integrated in the questionnaire scripting software is used to code the job title to the occupational classification in real-time during the interview. Where the job title does not contain sufficient information to be coded with confidence, the LLM probes for further relevant details on job tasks, industry, qualifications, and so on. SOCbot can also be used offline on already collected response data. Our results demonstrate that the approach attains rates of coder reliability comparable to trained human coders, with consistent performance across four major commercial and open-weight model families. SOCbot can also be deployed using publicly available open-weight models with only a small but measurable accuracy penalty, allowing even users with stringent data-protection constraints to use it. We also demonstrate that the approach is feasible in large-scale survey operations and has significant potential to reduce respondent burden, lower costs, and yield more timely and accurate data.