Disparities in AI‐Based Prior Authorization for Head and Neck Reconstruction: A Large Language Model Analysis

doi:10.1002/wjo2.70130

DOI: 10.1002/wjo2.70130 ISSN: 2095-8811

Disparities in AI‐Based Prior Authorization for Head and Neck Reconstruction: A Large Language Model Analysis

Shannon S. Wu, Mugil V. Shanmugam, Yu‐Jin Lee, Noel F. Ayoub

Show PDF Cite

ABSTRACT

Introduction

Large language models (LLM) are being rapidly integrated into healthcare, particularly to streamline time‐ and labor‐intensive administrative processes. However, the potential for artificial intelligence (AI) systems to demonstrate bias when employed for insurance authorization remains poorly understood. As insurers increasingly adopt AI to make coverage decisions, this study examined bias in LLM‐driven prior authorization in otolaryngology, using oral cavity squamous cell carcinoma (SCC) as a case study.

Methods

Using OpenAI's generative transformer GPT‐4o, this study assessed for LLM bias when simulating insurance coverage decisions for head and neck cancer reconstruction. A standardized clinical scenario was constructed involving patients with T2N2 oral cavity SCC, all requiring surgical resection and reconstruction. The LLM was prompted to choose between a radial forearm free flap (RFFF) and split‐thickness skin graft (STSG) for reconstruction, across 19,900 simulations. Patient profiles were systematically varied by age, sex, race/ethnicity, zip code‐based income level, socioeconomic status (SES), and substance use history.

Results

The LLM output showed significant disparities in approval decisions. RFFF was more frequently approved for younger, Asian, or white patients from high‐income zip codes or high SES backgrounds ( p < 0.0001). Older, Black, and Hispanic patients, and those from lower‐income areas or with substance use histories, were less likely to receive RFFF authorization ( p < 0.0001). On sensitivity analysis, inclusion of tumor‐specific information markedly skewed recommendations towards RFFF across sociodemographic backgrounds.

Conclusion

In this experimental study, the LLM's outputs exhibited significant disparities for oral cavity cancer reconstruction based on patient demographic variables in the setting of limited clinical information. Inputs to LLMs for clinical decision‐making should include pertinent and detailed information to reduce the risk of bias. As insurers increasingly integrate AI for prior authorization, recognition of its biases, rigorous safeguards, and increased regulatory governance are needed to promote equitable health care.

Outline

Disparities in AI‐Based Prior Authorization for Head and Neck Reconstruction: A Large Language Model Analysis

ABSTRACT

Introduction

Methods

Results

Conclusion

More from our Archive