Published on in Vol 11 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/58898, first published .
Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study

Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study

Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study

Journals

  1. Agarwal M, Sharma P, Wani P. Evaluating the Accuracy and Reliability of Large Language Models (ChatGPT, Claude, DeepSeek, Gemini, Grok, and Le Chat) in Answering Item-Analyzed Multiple-Choice Questions on Blood Physiology. Cureus 2025 View
  2. Shean R, Shah T, Pandiarajan A, Tang A, Bolo K, Nguyen V, Xu B. A comparative analysis of DeepSeek R1, DeepSeek-R1-Lite, OpenAi o1 Pro, and Grok 3 performance on ophthalmology board-style questions. Scientific Reports 2025;15(1) View
  3. Sudo H, Noborimoto Y, Takahashi J. Evaluation of Few-Shot AI-Generated Feedback on Case Reports in Physical Therapy Education: Mixed Methods Study. JMIR Medical Education 2025;11:e85614 View
  4. Reis F, Agha-Mir-Salim L, Hickstein R, Reis M, Piper S, Balzer F, Boie S. Disclaimers and Referral Patterns for Medical Advice Across Urgency Levels: A Large Language Model Evaluation Study (Preprint). Journal of Medical Internet Research 2025 View
  5. Chen K, Rogers K, Haberkorn W, Lew M, Kanegan J, Nam H, Chantra J, Asch S, Lee G. AI-driven analysis of patient safety reports using large language models: an exploratory multiple methods study. BMJ Quality & Safety 2026:bmjqs-2025-019495 View
  6. El Natour D, Abou Alfa M, Chaaban A, Assi R, Dally T, Bou Dargham B. Performance of Five AI Models on USMLE Step 1 Questions: A Comparative Observational Study (Preprint). JMIR AI 2025 View