A Comparative Study of Artificial Intelligence and Faculty Rubric‐Based Grading of Pharmacy Student Writing Assignments in an Evidence‐Based Medicine Course

doi:10.1002/jac5.70249

DOI: 10.1002/jac5.70249 ISSN: 2574-9870

A Comparative Study of Artificial Intelligence and Faculty Rubric‐Based Grading of Pharmacy Student Writing Assignments in an Evidence‐Based Medicine Course

Jennifer Phillips, Katherine Sarna, Faria Munir, Heather Ipema

Show PDF Cite

ABSTRACT

Background

Generative artificial intelligence (AI) tools are increasingly used in health professional education, including for automated writing evaluation (AWE). While AI‐based grading may reduce faculty workload and variability, its alignment with human grading remains unclear. This study compared rubric‐based grading of pharmacy student drug information papers by faculty and ChatGPT (Open AI, San Francisco, CA).

Methods

We conducted a retrospective comparative analysis of 159 de‐identified assignments from a required evidence‐based medicine course. Faculty‐assigned grades were paired with scores generated by a custom generative pretrained transformer (GPT) configured with the assignment rubric. The primary outcome was the difference in mean total scores; secondary outcomes included rubric section‐level scores, variability, and agreement metrics. Paired t ‐tests, Lin's concordance correlation coefficient, and Cohen's weighted kappa were used for analysis.

Results

Aggregate mean total scores were higher for ChatGPT versus faculty (51.3 vs. 49; p = 0.0004, mean difference, 2.3; 95% confidence interval, 1.015–3.595), with differences in five of seven rubric sections. Faculty scores exhibited greater variability than ChatGPT scores (standard deviation, 7.4 vs. 3.7). Concordance between faculty and AI grading at the individual student level was poor (Lin's coefficient for total score = 0.06; kappa for overall grade = 0.03). ChatGPT also assigned more “B” grades and fewer failing grades compared with faculty.

Conclusion

AI‐based grading produced similar aggregate letter grades and reduced variability but demonstrated poor agreement with faculty scores at the individual level. AI grading may complement, not replace, faculty evaluation for assignments requiring critical appraisal, but further studies are needed.

Outline

A Comparative Study of Artificial Intelligence and Faculty Rubric‐Based Grading of Pharmacy Student Writing Assignments in an Evidence‐Based Medicine Course

ABSTRACT

Background

Methods

Results

Conclusion

More from our Archive