Performance of Plug-In Augmented ChatGPT and Its Ability to Quantify Uncertainty: Simulation Study on the German Medical Board Examination

doi:10.2196/58375

Published on 21.Mar.2025 in Vol 11 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/58375, first published 14.Mar.2024.

Hand filling in bubbles on a standardized test answer sheet with a white pencil.

Performance of Plug-In Augmented ChatGPT and Its Ability to Quantify Uncertainty: Simulation Study on the German Medical Board Examination

Julian Madrid¹

; Philipp Diehl¹

; Mischa Selig^{2, 3}

; Bernd Rolauffs^{2, 3}

; Felix Patricius Hans^{2, 4}

; Hans-Jörg Busch^{2, 4}

; Tobias Scheef^{2, 5}

; Leo Benning^{2, 4}

Article Authors Cited by (5) Tweetations Metrics

Journals

Wang R, Ding Y, Shen Y, Liu H, Wang P, Gao Z. Comparative Evaluation of Teaching Plans on Prostate Cancer Generated by Various Large Language Models and a Human Expert. Engineering Reports 2025;7(8) View
Kasagga A, Sapkota A, Changaramkumarath G, Abucha J, Wollel M, Somannagari N, Husami M, Hailu K, Kasagga E. Performance of ChatGPT and Large Language Models on Medical Licensing Exams Worldwide: A Systematic Review and Network Meta-Analysis With Meta-Regression. Cureus 2025 View
Shao M, Zhang H. Two-stage prompting framework with predefined verification steps for evaluating diagnostic reasoning tasks on two datasets. npj Digital Medicine 2025;8(1) View
Li J, Ai F, Wang J, Cheng B, Li Y, Chen Z. Application of AI-Generated Content in Medical Education: Systematic Review of the Impact on Critical Thinking Abilities of Medical Students. JMIR Medical Education 2026;12:e79939 View
Koç A, Ataş A, Yosunkaya Ş, Vatansev H. Performance of large language models on sleep medicine certification examination: a comprehensive multi-model analysis. Frontiers in Medicine 2026;13 View

Citation

Please cite as:

Madrid J, Diehl P, Selig M, Rolauffs B, Hans FP, Busch HJ, Scheef T, Benning L
Performance of Plug-In Augmented ChatGPT and Its Ability to Quantify Uncertainty: Simulation Study on the German Medical Board Examination
JMIR Med Educ 2025;11:e58375
doi: 10.2196/58375 PMID: 40116759 PMCID: 11951815

Export Metadata

END for: Endnote

BibTeX for: BibDesk, LaTeX

RIS for: RefMan, Procite, Endnote, RefWorks

Add this article to your Mendeley library

This paper is in the following e-collection/theme issue:

Graduate and Postgraduate Education for Health Professionals (157) Testing and Assessment in Medical Education (129) Artificial Intelligence (2496) Artificial Intelligence (AI) in Medical Education (259) Generative Language Models Including ChatGPT (909) Applications of AI (402)

Download

Download PDF Download XML

Share Article

Share on Bluesky Share on Twitter Share on Facebook Share on LinkedIn