The role of artificial intelligence in academic citation: A study on lens, cataract, and anterior segment research
Mustafa Civelekler, Mehmet ÇıtırıkPurpose:
To evaluate the accuracy and reliability of four artificial intelligence (AI) models—ChatGPT, Copilot, DeepSeek, and Gemini—in generating PubMed citations for literature related to lens disease, cataracts, iris disorders, and anterior chamber pathology.
Design:
Comparative accuracy assessment study.
Methods:
Forty standardized clinical paragraphs from The Review of Ophthalmology (4 th edition) were used as test inputs. Each AI model was prompted to generate AMA-11–style PubMed references. Citation accuracy was assessed using predefined criteria, including PubMed verifiability, DOI concordance, and bibliographic accuracy. Two expert reviewers independently classified the citations as fully cited, partially cited, or not cited, and assessed inter-rater reliability.
Results:
The citation accuracy varied significantly among the models. DeepSeek demonstrated the highest accuracy (52.5%), followed by ChatGPT (32.5%) and Copilot (20.0%), whereas Gemini demonstrated the lowest accuracy (2.5%) (
Conclusion:
Domain-specific AI models, particularly DeepSeek, outperform general-purpose models in generating PubMed citations from ophthalmic literature. However, all the evaluated models exhibited citation errors, underscoring the necessity of human verification. AI tools may enhance academic workflows as assistive systems but should not be used autonomously for reference generation in medical research.