A Comparative Study of Artificial Intelligence and Faculty Rubric‐Based Grading of Pharmacy Student Writing Assignments in an Evidence‐Based Medicine Course
Jennifer Phillips, Katherine Sarna, Faria Munir, Heather IpemaABSTRACT
Background
Generative artificial intelligence (AI) tools are increasingly used in health professional education, including for automated writing evaluation (AWE). While AI‐based grading may reduce faculty workload and variability, its alignment with human grading remains unclear. This study compared rubric‐based grading of pharmacy student drug information papers by faculty and ChatGPT (Open AI, San Francisco, CA).
Methods
We conducted a retrospective comparative analysis of 159 de‐identified assignments from a required evidence‐based medicine course. Faculty‐assigned grades were paired with scores generated by a custom generative pretrained transformer (GPT) configured with the assignment rubric. The primary outcome was the difference in mean total scores; secondary outcomes included rubric section‐level scores, variability, and agreement metrics. Paired t ‐tests, Lin's concordance correlation coefficient, and Cohen's weighted kappa were used for analysis.
Results
Aggregate mean total scores were higher for ChatGPT versus faculty (51.3 vs. 49; p = 0.0004, mean difference, 2.3; 95% confidence interval, 1.015–3.595), with differences in five of seven rubric sections. Faculty scores exhibited greater variability than ChatGPT scores (standard deviation, 7.4 vs. 3.7). Concordance between faculty and AI grading at the individual student level was poor (Lin's coefficient for total score = 0.06; kappa for overall grade = 0.03). ChatGPT also assigned more “B” grades and fewer failing grades compared with faculty.
Conclusion
AI‐based grading produced similar aggregate letter grades and reduced variability but demonstrated poor agreement with faculty scores at the individual level. AI grading may complement, not replace, faculty evaluation for assignments requiring critical appraisal, but further studies are needed.