Accessibility settings

Published on in Vol 11 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/64284, first published .
Performance Evaluation and Implications of Large Language Models in Radiology Board Exams: Prospective Comparative Analysis

Performance Evaluation and Implications of Large Language Models in Radiology Board Exams: Prospective Comparative Analysis

Performance Evaluation and Implications of Large Language Models in Radiology Board Exams: Prospective Comparative Analysis

Authors of this article:

Boxiong Wei1 Author Orcid Image

Journals

  1. Zhou S, Liu X, Li D, Gu T, Liu K, Yang Y, Wong M. Integrating domain-specific knowledge and fine-tuned general-purpose large language models for question-answering in construction engineering management. Automation in Construction 2025;175:106206 View
  2. Kao J, Kao H. Large Language Models in radiology: A technical and clinical perspective. European Journal of Radiology Artificial Intelligence 2025;2:100021 View
  3. Bolgova O, Ganguly P, Mavrych V. Comparative analysis of LLMs performance in medical embryology: A cross‐platform study of ChatGPT, Claude, Gemini, and Copilot. Anatomical Sciences Education 2025;18(7):718 View
  4. Akpınar H. Comparison of responses from different artificial intelligence-powered chatbots regarding the All-on-four dental implant concept. BMC Oral Health 2025;25(1) View
  5. Salbas A, Buyuktoka R. Performance of Large Language Models in Recognizing Brain MRI Sequences: A Comparative Analysis of ChatGPT-4o, Claude 4 Opus, and Gemini 2.5 Pro. Diagnostics 2025;15(15):1919 View
  6. Bolgova O, Mavrych V. Evolution of AI in anatomy education study based on comparison of current large language models against historical ChatGPT performance. Scientific Reports 2025;15(1) View
  7. Martini R, Sang A, Saunders P, Bala W, Li H, Moon J, Balthazar P. Artificial Intelligence in Radiology: Performance of ChatGPT-4v and GPT-4o on Diagnostic Radiology in-Training (DXIT) Examination Questions. Journal of the American College of Radiology 2026;23(4):599 View
  8. Salbas A, Yogurtcu M. Performance of Large Language Models on Radiology Residency In-Training Examination Questions. Academic Radiology 2026;33(2):337 View
  9. Sudo H, Noborimoto Y, Takahashi J. Evaluation of Few-Shot AI-Generated Feedback on Case Reports in Physical Therapy Education: Mixed Methods Study. JMIR Medical Education 2025;11:e85614 View
  10. Almomani M, Valaparla V, Weatherhead J, Fang X, Dabi A, Li C, McCaffrey P, Hier D, Rodríguez-Fernández J. Evaluation of multiple generative large language models on neurology board-style questions. Frontiers in Digital Health 2026;7 View
  11. Xin J, He X. Evaluating Large Language Models as Medical Consultation Tools for Double Eyelid Surgery: A Cross-Language Study in English and Chinese. Aesthetic Plastic Surgery 2026;50(5):1706 View
  12. López-Úbeda P, Martín-Noguerol T, Luna A. Radiology Board-Style Examinations and LLMs: A Scoping Review of Model Performance. Journal of the American College of Radiology 2026 View
  13. Singh A, Khan E, Moffat M. LEAPPT: Leveraging gEnerative Artificial Intelligence—Pediatric Pulmonology Training. Curriculum and content development. ATS Scholar 2026;7(1):9 View
  14. Cai J, Chen J, Yu T, He L, Chen L, Qi H, Zhan X, Yu W, Huang X, Huang P. Multidisciplinary expert evaluation of large language models on questions regarding bariatric surgery: a comparative analysis of ERNIE Bot 4.0, ChatGPT-4, Claude 3 Opus, and Gemini Pro. Scientific Reports 2026 View
  15. Guzelce M, Ozgur S, Salli I. Evaluation of ChatGPT’s performance on emergency medicine board examination questions. Turkish Journal of Emergency Medicine 2026;26(2):110 View

Books/Policy Documents

  1. Fantozzi I, Martuscelli L, Schiraldi M. Manufacturing 2030 - A Perspective to Future Challenges in Industrial Production. View