Published on in Vol 10 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/63430, first published .
ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis

ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis

ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis

Journals

  1. McHugh J, Challener D, Tabaja H. Change of Heart: Can Artificial Intelligence Transform Infective Endocarditis Management?. Pathogens 2025;14(4):371 View
  2. Elkin P, Mehta G, LeHouillier F, Resnick M, Mullin S, Tomlin C, Resendez S, Liu J, Nebeker J, Brown S. Semantic Clinical Artificial Intelligence vs Native Large Language Model Performance on the USMLE. JAMA Network Open 2025;8(4):e256359 View
  3. Tekin M, Yurdal M, Toraman Ç, Korkmaz G, Uysal İ. Is AI the future of evaluation in medical education?? AI vs. human evaluation in objective structured clinical examination. BMC Medical Education 2025;25(1) View
  4. Wang L, Li J, Zhuang B, Huang S, Fang M, Wang C, Li W, Zhang M, Gong S. Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis. Journal of Medical Internet Research 2025;27:e64486 View
  5. Bolgova O, Ganguly P, Mavrych V. Comparative analysis of LLMs performance in medical embryology: A cross‐platform study of ChatGPT, Claude, Gemini, and Copilot. Anatomical Sciences Education 2025;18(7):718 View
  6. Wang W, Fu J, Zhang Y, Hu K. A Comparative Analysis of GPT-4o and ERNIE Bot in a Chinese Radiation Oncology Exam. Journal of Cancer Education 2025 View
  7. Wu J, Wang Z, Qin Y. Performance of DeepSeek-R1 and ChatGPT-4o on the Chinese National Medical Licensing Examination: A Comparative Study. Journal of Medical Systems 2025;49(1) View
  8. Altermatt F, Neyem A, Sumonte N, Villagrán I, Mendoza M, Lacassie H. Evaluating the Performance of Large Language Models on the CONACEM Anesthesiology Certification Exam: A Comparison with Human Participants. Applied Sciences 2025;15(11):6245 View
  9. Bruneti Severino J, Nespolo Berger M, Basei de Paula P, Loures F, Todeschini S, Roeder E, Han Veiga M, Knopfholz J, Lenci Marques G. Performance Benchmarking of Open-Source Large Language Models on the Brazilian Society of Cardiology's Certification Exam. International Journal of Cardiovascular Sciences 2025;38 View
  10. Solomon T, Laye M, Ahmed S. The sports nutrition knowledge of large language model (LLM) artificial intelligence (AI) chatbots: An assessment of accuracy, completeness, clarity, quality of evidence, and test-retest reliability. PLOS One 2025;20(6):e0325982 View
  11. Ucdal M, Bakhshandehpour A, Durak M, Balaban Y, Kekilli M, Simsek C. Evaluating the Role of Artificial Intelligence in Making Clinical Decisions for Treating Acute Pancreatitis. Journal of Clinical Medicine 2025;14(12):4347 View
  12. Antillón F. Inteligencia Artificial en Educación Médica: ¿hacia dónde?. Revista de la Facultad de Medicina 2025;3(1):4 View
  13. Liu Y, Yuan Y, Yan K, Li Y, Sacca V, Hodges S, Cannistra M, Jeong P, Wu J, Kong J. Evaluating the role of large language models in traditional Chinese medicine diagnosis and treatment recommendations. npj Digital Medicine 2025;8(1) View
  14. Karamanlıoğlu A, Demirel B, Tural O, Doğan O, Alpaslan F. Privacy-Preserving Clinical Decision Support for Emergency Triage Using LLMs: System Architecture and Real-World Evaluation. Applied Sciences 2025;15(15):8412 View
  15. Dagi A, Jones N, Bogue J. When does ChatGPT refer someone to a plastic surgeon?. Journal of Plastic, Reconstructive & Aesthetic Surgery 2025;109:20 View
  16. Zhong R, Chen S, Li Z, Gao T, Su Y, Zhang W, Liu D, Gao L, Hu K. Large Language Models in Lung Cancer: A Systematic Review (Preprint). Journal of Medical Internet Research 2025 View
  17. Feng Y. Can LLMs effectively assist medical coding? Evaluating GPT performance on DRG and targeted clinical tasks. BMC Medical Informatics and Decision Making 2025;25(1) View
  18. Gül M. Large language models underperform in European general surgery board examinations: a comparative study with experts and surgical residents. BMC Medical Education 2025;25(1) View
  19. Prasad S, Travis L, Thornton M, Thaller S. Comparison of GPT-4o and o3-Mini on Otolaryngology USMLE-Style Questions. Journal of Craniofacial Surgery 2025 View
  20. Masanneck L, Epping P, Meuth S, Pawlitzki M. Evaluating Web Retrieval–Assisted Large Language Models With and Without Whitelisting for Evidence-Based Neurology: Comparative Study. Journal of Medical Internet Research 2025;27:e79379 View
  21. Zhang Y, Xie X, Xu Q. ChatGPT in Medical Education: Bibliometric and Visual Analysis. JMIR Medical Education 2025;11:e72356 View
  22. Landon S, Savage T, Greysen S, Bressman E. Variation in Large Language Model Recommendations in Challenging Inpatient Management Scenarios. Journal of General Internal Medicine 2025 View
  23. Alohali K, Almusaeeb L, Almubarak A, Alohali A, Muaygil R. Reasoning-based LLMs surpass average human performance on medical social skills. Scientific Reports 2025;15(1) View
  24. Gérard A, Lombardi R, Merino D, Bouveyron C, Dellamonica J, Drici M, Lavrut T, Destere A. A new chapter in pharmacology: Artificial intelligence's expanding role in pharmacokinetics, pharmacodynamics, and pharmacovigilance. Therapies 2025 View
  25. Kasagga A, Sapkota A, Changaramkumarath G, Abucha J, Wollel M, Somannagari N, Husami M, Hailu K, Kasagga E. Performance of ChatGPT and Large Language Models on Medical Licensing Exams Worldwide: A Systematic Review and Network Meta-Analysis With Meta-Regression. Cureus 2025 View
  26. Sabouni S, Moufti M, Taha M. From Hype to Implementation: Embedding GPT-4o in Medical Education. JMIR Medical Education 2025;11:e79309 View
  27. Ahn J, Kang B, Chang M, Yoon S. Applications and Future Perspectives of Large Language Models in Otolaryngology-Head and Neck Surgery: A Comprehensive Survey. Clinical and Experimental Otorhinolaryngology 2025;18(4):283 View
  28. Chen G, Lin C, Zhang L, Luo Z, Shin Y, Li X. Virtual case reasoning and AI-assisted diagnostic instruction: an empirical study based on body interact and large language models. BMC Medical Education 2025;25(1) View
  29. Tao L, Liu J, Lu X, Zhao Y, Zhang Y, Zhu Z, Li T, Zhang Z, Zhang Y, Yan W, Liu M, Liang W. Performance of the Large Language Model in General Medicine. Global Transitions 2025 View
  30. Özler Z, Karaman B, Atalay E. ASSESSING THE PERFORMANCE OF WIDELY USED LARGE LANGUAGE MODELS ACROSS MEDICAL DISCIPLINES USING USMLE-STYLE EXAM QUESTIONS: AN IN-DEPTH EVALUATION. TURKISH MEDICAL STUDENT JOURNAL 2025 View
  31. Cevallos López G, Ubillús Reyes J, Chocobar Reyes E. Argumentos a favor de permitir o prohibir el uso de la inteligencia artificial generativa por estudiantes. Una revisión sistemática. European Public & Social Innovation Review 2025;11:1 View
  32. Al‐Haj Ali S. Reliability of Multimodal AI for Assessing Preclinical Stainless Steel Crown Preparations: A Comparative Study With Human Experts. International Journal of Paediatric Dentistry 2025 View
  33. Wang W, Zhou Y, Fu J, Hu K. Evaluating the Performance of DeepSeek-R1 and DeepSeek-V3 Versus OpenAI Models in the Chinese National Medical Licensing Examination: Cross-Sectional Comparative Study. JMIR Medical Education 2025;11:e73469 View
  34. Kuhn S, Knitza J. Leitliniengerechte Osteoporoseversorgung durch LLMs? Ein Scoping Review zum Potenzial generativer KI. Osteologie 2025;34(04):250 View
  35. Zeng J, Qi W, Shen S, Liu X, Li S, Wang B, Dong C, Zhu X, Shi Y, Lou X, Wang B, Yao J, Jiang G, Zhang Q, Cao S. Embracing the Future of Medical Education With Large Language Model–Based Virtual Patients: Scoping Review. Journal of Medical Internet Research 2025;27:e79091 View
  36. Akinniranye O, Akinniranye O. Performance of Large Language Models and Top-Decile Doctors on an Undergraduate Ophthalmology Examination. Cureus 2025 View
  37. Li S. Towards A Fair Duel: Reflections on the Evaluation of DeepSeek-R1 and ChatGPT-4o in Chinese Medical Education. Journal of Medical Systems 2025;49(1) View

Books/Policy Documents

  1. Yoo Y, Georgescu B, Zhang Y, Grbic S, Liu H, Aldea G, Re T, Das J, Ullaskrishnan P, Eibenberger E, Chekkoury A, Bodanapally U, Nicolaou S, Sanelli P, Schroeppel T, Lui Y, Gibson E. Medical Image Computing and Computer Assisted Intervention – MICCAI 2025. View

Conference Proceedings

  1. Shetgaonkar A, Pradhan D, Arora L, Girija S, Raj A, Kapoor S. 2025 IEEE 49th Annual Computers, Software, and Applications Conference (COMPSAC). Mitigating Clinician Information Overload: Generative AI for Integrated EHR and RPM Data Analysis View