DOI: 10.1177/10519815261462135 ISSN: 1051-9815

Assessing the efficacy of LLMs in neuroscience: A performance analysis of ChatGPT and gemini on synaptic plasticity

Melek Altunkaya, Ercan Babur, Emine Cihan, Cansu Sahbaz Pirincci

Background

Synaptic plasticity, which plays a critical role in fundamental neurological processes, is a complex subject to master. Therefore, large language models (LLMs) are increasingly being used to facilitate the learning of such complex topics. However, these models have limitations, including producing inaccurate information and failing to capture the nuances of scientific terminology.

Objectives

This study aimed to evaluate the accuracy, quality and readability of LLM responses to questions on synaptic plasticity.

Methods

The widely used LLMs ChatGPT-4 and Gemini 2.5 were selected in the study. Ten questions were posed to each LLM, and the initial responses were recorded. Five neurophysiologists evaluated the responses qualitatively using a 4-point Likert scale. Readability level of the answers was analyzed using Flesch-Kincaid Grade Level test.

Results

In the qualitative assessment, both models generally provided accurate and acceptable information. Within the limited scope of the questions analyzed, Gemini received higher median scores in certain instances; however, no statistically significant difference was observed between the two models across most of the question set. Linguistic analysis showed that Gemini's responses were longer and featured a higher Flesch-Kincaid Grade Level, suggesting a structure more aligned with academic or technical discourse.

Conclusion

For the specific neuroscientific inquiries examined in this study, both LLMs demonstrated a high capacity for generating accurate content. While Gemini's responses exhibited a more technical linguistic profile, the findings are context-specific and further research is needed to determine if these trends persist across broader scientific domains and larger datasets.

More from our Archive