DOI: 10.1093/pnasnexus/pgaf089 ISSN: 2752-6542

Measuring gender and racial biases in large language models: Intersectional evidence from automated resume evaluation

Jiafu An, Difang Huang, Chen Lin, Mingzhu Tai

Abstract

In traditional decision-making processes, social biases of human decision makers can lead to unequal economic outcomes for underrepresented social groups, such as women and racial/ethnic minorities.1-4 Recently, the growing popularity of Large Language Model (LLM)-based artificial intelligence (AI) signals a potential shift from human to AI-based decision making. How would this transition affect the distributional outcomes across social groups? Here we investigate the gender and racial biases of a number of commonly used LLMs, including OpenAI’s GPT-3.5 Turbo and GPT-4o, Google’s Gemini 1.5 Flash, Anthropic AI’s Claude 3.5 Sonnet, and Meta’s Llama 3-70b, in a high-stakes decision-making setting of assessing entry-level job candidates from diverse social groups. Instructing the models to score approximately 361,000 resumes with randomized social identities, we find that the LLMs award higher assessment scores for female candidates with similar work experience, education, and skills, while lower scores for black male candidates with comparable qualifications. These biases may result in approximately 1-3 percentage-point differences in hiring probabilities for otherwise similar candidates at a certain threshold and are consistent across various job positions and subsamples. Meanwhile, many models are biased against black male candidates. Our results indicate that LLM-based AI systems demonstrate significant biases, varying in terms of the directions and magnitudes across different social groups. Further research is needed to comprehend the root causes of these outcomes and develop strategies to minimize the remaining biases in AI systems. As AI-based decision-making tools are increasingly employed across diverse domains, our findings underscore the necessity of understanding and addressing the potential unequal outcomes to ensure equitable outcomes across social groups.

More from our Archive