Citation Accuracy Challenges Posed by Large Language Models

doi:10.2196/72998

¹Department of Obstetrics and Gynecology, Shengjing Hospital of China Medical University, Shenyang, China

²Department of Science and Technology Studies, University College London, Gower Street, London, United Kingdom

*all authors contributed equally

Corresponding Author:

Tianyu Zhao, MSc

Related ArticlesComment on https://mededu.jmir.org/2025/1/e63400
Comment in https://mededu.jmir.org/2025/1/e73698

JMIR Med Educ 2025;11:e72998

doi:10.2196/72998

Keywords

chatGPT (334); medical education (547); Saudi Arabia (61); perceptions (91); knowledge (128); medical students (106); faculty (12); chatbot (279); qualitative study (126); artificial intelligence (1779); AI (603); AI-based tools; universities (25); thematic analysis (161); learning (88); satisfaction (124); LLM (149); large language model (208)

Large language models (LLMs) such as DeepSeek, ChatGPT, and ChatGLM have significant limitations in generating citations, raising concerns about the quality and reliability of academic research. These models tend to produce citations that are correctly formatted but fictional in content, misleading users and undermining academic rigor. In the recent study titled “Perceptions and earliest experiences of medical students and faculty with ChatGPT in medical education: qualitative study,” the section addressing concerns about ChatGPT deserves a deeper discussion [Abouammoh N, Alhasan K, Aljamaan F, et al. Perceptions and earliest experiences of medical students and faculty with ChatGPT in medical education: qualitative study. JMIR Med Educ. Feb 20, 2025;11:e63400. [CrossRef] [Medline]1].

There are several reasons for the citation issues in LLMs, which can be analyzed as follows. First, most LLMs cannot access paid subscription databases and therefore solely rely on open-access resources [Perianes-Rodríguez A, Olmeda-Gómez C. Effects of journal choice on the visibility of scientific publications: a comparison between subscription-based and full open access models. Scientometrics. Dec 2019;121(3):1737-1752. [CrossRef]2]. This limits the citations generated by LLMs to open-access journals, potentially omitting more significant research published in subscription-based journals. Second, LLMs are trained on vast amounts of text data and generate content by analyzing patterns and structures in text. However, they lack the ability to understand the content of the text or think critically, implying that they cannot judge the accuracy and reliability of information. Third, the algorithms underlying LLMs are often opaque, leaving users unable to understand the specific processes of information handling. This makes it difficult for users to determine the reliability of citations generated by LLMs and to effectively evaluate their results. Recent research also stated that half of generated search results lack citations, and only 75% of those with citations support the claims, posing trust concerns as user reliance grows[Peskoff D, Stewart B. Credible without credit: domain experts assess generative language models. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Jul 2023;2:427-438. [CrossRef]3].

Recently, an experiment conducted by the Journal of Clinical Anesthesia involved publishing a fictional article titled “Spinal Cord Ischemia After ESP Block” to test the spread and citation of a fabricated academic content. Surprisingly, the fictional article was widely cited, over 400 times, including in some journals with high impact factors[Marcus A, Oransky I, De Cassai A. Please don’t cite this editorial. J Clin Anesth. Jan 8, 2025:111741. [CrossRef] [Medline]4], revealing a lack of rigor in academic citation practices, where many authors may not check the original literature and instead copy references directly. This incident sparked widespread discussion about academic citation practices, emphasizing the importance of critical thinking by scholars while citing materials.

The use of fictional citations by LLMs poses a multifaceted problem: it misleads users into drawing incorrect conclusions and making inappropriate decisions, undermines the rigor and credibility of academic research, and hinders the dissemination of knowledge by limiting access to accurate scientific information [Rasul T, Nair S, Kalendra D, et al. The role of ChatGPT in higher education: benefits, challenges, and future research directions. JALT. May 10, 2023;6(1):41-56. [CrossRef]5]. The issue of LLMs generating fictional citations is complex and requires the combined efforts of multiple stakeholders for resolution. Developers must continuously improve the LLM technology and algorithms, users must increase their awareness and critical evaluation skills while using LLMs, and academic institutions must strengthen the management and education in academic practices. Only through these efforts can we ensure that LLMs play a positive role in academic research and promote the dissemination and progress of knowledge.

Conflicts of Interest

None declared.

Abouammoh N, Alhasan K, Aljamaan F, et al. Perceptions and earliest experiences of medical students and faculty with ChatGPT in medical education: qualitative study. JMIR Med Educ. Feb 20, 2025;11:e63400. [CrossRef] [Medline]
Perianes-Rodríguez A, Olmeda-Gómez C. Effects of journal choice on the visibility of scientific publications: a comparison between subscription-based and full open access models. Scientometrics. Dec 2019;121(3):1737-1752. [CrossRef]
Peskoff D, Stewart B. Credible without credit: domain experts assess generative language models. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Jul 2023;2:427-438. [CrossRef]
Marcus A, Oransky I, De Cassai A. Please don’t cite this editorial. J Clin Anesth. Jan 8, 2025:111741. [CrossRef] [Medline]
Rasul T, Nair S, Kalendra D, et al. The role of ChatGPT in higher education: benefits, challenges, and future research directions. JALT. May 10, 2023;6(1):41-56. [CrossRef]

‎

LLM: large language model

Edited by Surya Nedunchezhiyan; This is a non–peer-reviewed article. submitted 23.02.25; accepted 12.03.25; published 02.04.25.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Education, is properly cited. The complete bibliographic information, a link to the original publication on https://mededu.jmir.org/, as well as this copyright and license information must be included.

This paper is in the following e-collection/theme issue:

Citation Accuracy Challenges Posed by Large Language Models