Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study

doi:10.2196/58898

Published on 13.Jan.2025 in Vol 11 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/58898, first published 29.Mar.2024.

Doctor reviews AI medical data on large screen and tablet.

Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study

Naritsaret Kaewboonlert¹

; Jiraphon Poontananggul¹

; Natthipong Pongsuwan¹

; Gun Bhakdisongkhram¹

Article Authors Cited by (7) Tweetations Metrics

Journals

Agarwal M, Sharma P, Wani P. Evaluating the Accuracy and Reliability of Large Language Models (ChatGPT, Claude, DeepSeek, Gemini, Grok, and Le Chat) in Answering Item-Analyzed Multiple-Choice Questions on Blood Physiology. Cureus 2025 View
Shean R, Shah T, Pandiarajan A, Tang A, Bolo K, Nguyen V, Xu B. A comparative analysis of DeepSeek R1, DeepSeek-R1-Lite, OpenAi o1 Pro, and Grok 3 performance on ophthalmology board-style questions. Scientific Reports 2025;15(1) View
Sudo H, Noborimoto Y, Takahashi J. Evaluation of Few-Shot AI-Generated Feedback on Case Reports in Physical Therapy Education: Mixed Methods Study. JMIR Medical Education 2025;11:e85614 View
Reis F, Agha-Mir-Salim L, Hickstein R, Reis M, Piper S, Balzer F, Boie S. Disclaimers and Referral Patterns for Medical Advice Across Urgency Levels: Large Language Model Evaluation Study. Journal of Medical Internet Research 2026;28:e84668 View
Chen K, Rogers K, Haberkorn W, Lew M, Kanegan J, Nam H, Chantra J, Asch S, Lee G. AI-driven analysis of patient safety reports using large language models: an exploratory multiple methods study. BMJ Quality & Safety 2026:bmjqs-2025-019495 View
El Natour D, Abou Alfa M, Chaaban A, Assi R, Dally T, Bou Dargham B. Performance of 5 AI Models on United States Medical Licensing Examination Step 1 Questions: Comparative Observational Study. JMIR AI 2026;5:e76928 View
Stephenson E, Robinson S, Bascombe K, Okorie M. Secure AI-assisted angoff standard-setting for single best answer questions: A non-inferiority validation study. Medical Teacher 2026:1 View

This paper is in the following e-collection/theme issue:

Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study

Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study

Journals