Accessibility settings

Published on in Vol 11 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/58898, first published .
Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study

Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study

Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study

Journals

  1. Agarwal M, Sharma P, Wani P. Evaluating the Accuracy and Reliability of Large Language Models (ChatGPT, Claude, DeepSeek, Gemini, Grok, and Le Chat) in Answering Item-Analyzed Multiple-Choice Questions on Blood Physiology. Cureus 2025 View
  2. Shean R, Shah T, Pandiarajan A, Tang A, Bolo K, Nguyen V, Xu B. A comparative analysis of DeepSeek R1, DeepSeek-R1-Lite, OpenAi o1 Pro, and Grok 3 performance on ophthalmology board-style questions. Scientific Reports 2025;15(1) View
  3. Sudo H, Noborimoto Y, Takahashi J. Evaluation of Few-Shot AI-Generated Feedback on Case Reports in Physical Therapy Education: Mixed Methods Study. JMIR Medical Education 2025;11:e85614 View
  4. Reis F, Agha-Mir-Salim L, Hickstein R, Reis M, Piper S, Balzer F, Boie S. Disclaimers and Referral Patterns for Medical Advice Across Urgency Levels: Large Language Model Evaluation Study. Journal of Medical Internet Research 2026;28:e84668 View
  5. Chen K, Rogers K, Haberkorn W, Lew M, Kanegan J, Nam H, Chantra J, Asch S, Lee G. AI-driven analysis of patient safety reports using large language models: an exploratory multiple methods study. BMJ Quality & Safety 2026:bmjqs-2025-019495 View
  6. El Natour D, Abou Alfa M, Chaaban A, Assi R, Dally T, Bou Dargham B. Performance of 5 AI Models on United States Medical Licensing Examination Step 1 Questions: Comparative Observational Study. JMIR AI 2026;5:e76928 View