Performance Evaluation and Implications of Large Language Models in Radiology Board Exams: Prospective Comparative Analysis

doi:10.2196/64284

Published on 16.Jan.2025 in Vol 11 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/64284, first published 14.Jul.2024.

Hands typing on a laptop keyboard, close-up of digital work

Performance Evaluation and Implications of Large Language Models in Radiology Board Exams: Prospective Comparative Analysis

Boxiong Wei¹

Article Authors Cited by (18) Tweetations Metrics

Journals

Zhou S, Liu X, Li D, Gu T, Liu K, Yang Y, Wong M. Integrating domain-specific knowledge and fine-tuned general-purpose large language models for question-answering in construction engineering management. Automation in Construction 2025;175:106206 View
Kao J, Kao H. Large Language Models in radiology: A technical and clinical perspective. European Journal of Radiology Artificial Intelligence 2025;2:100021 View
Bolgova O, Ganguly P, Mavrych V. Comparative analysis of LLMs performance in medical embryology: A cross‐platform study of ChatGPT, Claude, Gemini, and Copilot. Anatomical Sciences Education 2025;18(7):718 View
Akpınar H. Comparison of responses from different artificial intelligence-powered chatbots regarding the All-on-four dental implant concept. BMC Oral Health 2025;25(1) View
Salbas A, Buyuktoka R. Performance of Large Language Models in Recognizing Brain MRI Sequences: A Comparative Analysis of ChatGPT-4o, Claude 4 Opus, and Gemini 2.5 Pro. Diagnostics 2025;15(15):1919 View
Bolgova O, Mavrych V. Evolution of AI in anatomy education study based on comparison of current large language models against historical ChatGPT performance. Scientific Reports 2025;15(1) View
Martini R, Sang A, Saunders P, Bala W, Li H, Moon J, Balthazar P. Artificial Intelligence in Radiology: Performance of ChatGPT-4v and GPT-4o on Diagnostic Radiology in-Training (DXIT) Examination Questions. Journal of the American College of Radiology 2026;23(4):599 View
Salbas A, Yogurtcu M. Performance of Large Language Models on Radiology Residency In-Training Examination Questions. Academic Radiology 2026;33(2):337 View
Sudo H, Noborimoto Y, Takahashi J. Evaluation of Few-Shot AI-Generated Feedback on Case Reports in Physical Therapy Education: Mixed Methods Study. JMIR Medical Education 2025;11:e85614 View
Almomani M, Valaparla V, Weatherhead J, Fang X, Dabi A, Li C, McCaffrey P, Hier D, Rodríguez-Fernández J. Evaluation of multiple generative large language models on neurology board-style questions. Frontiers in Digital Health 2026;7 View
Xin J, He X. Evaluating Large Language Models as Medical Consultation Tools for Double Eyelid Surgery: A Cross-Language Study in English and Chinese. Aesthetic Plastic Surgery 2026;50(5):1706 View
López-Úbeda P, Martín-Noguerol T, Luna A. Radiology Board-Style Examinations and Large Language Models: A Scoping Review of Model Performance. Journal of the American College of Radiology 2026;23(5):837 View
Singh A, Khan E, Moffat M. LEAPPT: Leveraging gEnerative Artificial Intelligence—Pediatric Pulmonology Training. Curriculum and content development. ATS Scholar 2026;7(1):9 View
Cai J, Chen J, Yu T, He L, Chen L, Qi H, Zhan X, Yu W, Huang X, Huang P. Multidisciplinary expert evaluation of large language models on questions regarding bariatric surgery: a comparative analysis of ERNIE Bot 4.0, ChatGPT-4, Claude 3 Opus, and Gemini Pro. Scientific Reports 2026;16(1) View
Guzelce M, Ozgur S, Salli I. Evaluation of ChatGPT’s performance on emergency medicine board examination questions. Turkish Journal of Emergency Medicine 2026;26(2):110 View
Bolgova O, Mavrych V, Almidani E, Alshareef T, Kemahlı S. A Comparative Analysis of AI-Language Models’ MCQ Performance versus Medical Students Across Different Pediatric Topics. Advances in Medical Education and Practice 2026;Volume 17:1 View
Siebielec J, Raciborski F. Assessment of vaccine information accuracy across large language models. Frontiers in Public Health 2026;14 View

Books/Policy Documents

Fantozzi I, Martuscelli L, Schiraldi M. Manufacturing 2030 - A Perspective to Future Challenges in Industrial Production. View

This paper is in the following e-collection/theme issue:

Performance Evaluation and Implications of Large Language Models in Radiology Board Exams: Prospective Comparative Analysis

Performance Evaluation and Implications of Large Language Models in Radiology Board Exams: Prospective Comparative Analysis

Journals

Books/Policy Documents