Published on in Vol 10 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/51148, first published .
Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis

Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis

Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis

Journals

  1. Funk P, Hoch C, Knoedler S, Knoedler L, Cotofana S, Sofo G, Bashiri Dezfouli A, Wollenberg B, Guntinas-Lichius O, Alfertshofer M. ChatGPT’s Response Consistency: A Study on Repeated Queries of Medical Examination Questions. European Journal of Investigation in Health, Psychology and Education 2024;14(3):657 View
  2. Knoedler L, Knoedler S, Hoch C, Prantl L, Frank K, Soiderer L, Cotofana S, Dorafshar A, Schenck T, Vollbach F, Sofo G, Alfertshofer M. In-depth analysis of ChatGPT’s performance based on specific signaling words and phrases in the question stem of 2377 USMLE step 1 style questions. Scientific Reports 2024;14(1) View
  3. Liu M, Okuhara T, Chang X, Shirabe R, Nishiie Y, Okada H, Kiuchi T. Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis. Journal of Medical Internet Research 2024;26:e60807 View
  4. Hibino M, Gillinov M. “Pseudo” Intelligence or Misguided or Mis-sourced Intelligence?. The Annals of Thoracic Surgery 2024;118(1):281 View
  5. Hirosawa T, Harada Y, Mizuta K, Sakamoto T, Tokumasu K, Shimizu T. Diagnostic performance of generative artificial intelligences for a series of complex case reports. DIGITAL HEALTH 2024;10 View
  6. Davis N, El-Said E, Fortune P, Shen A, Succi M. Transforming Health Care Landscapes: The Lever of Radiology Research and Innovation on Emerging Markets Poised for Aggressive Growth. Journal of the American College of Radiology 2024;21(10):1552 View
  7. Jin H, Lee H, Kim E. Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis. BMC Medical Education 2024;24(1) View
  8. Alfertshofer M, Knoedler S, Hoch C, Cotofana S, Panayi A, Kauke-Navarro M, Tullius S, Orgill D, Austen W, Pomahac B, Knoedler L. Analyzing Question Characteristics Influencing ChatGPT’s Performance in 3000 USMLE®-Style Questions. Medical Science Educator 2024 View
  9. Künzle P, Paris S. Performance of large language artificial intelligence models on solving restorative dentistry and endodontics student assessments. Clinical Oral Investigations 2024;28(11) View
  10. Ramgopal S, Varma S, Gorski J, Kester K, Shieh A, Suresh S. Evaluation of a Large Language Model on the American Academy of Pediatrics' PREP Emergency Medicine Question Bank. Pediatric Emergency Care 2024;40(12):871 View
  11. Cooperman S, Brandão R. Integrating domain-specific resources: Advancing AI for foot and ankle surgery. Foot & Ankle Surgery: Techniques, Reports & Cases 2025;5(1):100445 View
  12. Hofmann H, Vairavamurthy J. Large language model doctor: assessing the ability of ChatGPT-4 to deliver interventional radiology procedural information to patients during the consent process. CVIR Endovascular 2024;7(1) View
  13. Jin H, Kim E. Performance of GPT-3.5 and GPT-4 on the Korean Pharmacist Licensing Examination: Comparison Study. JMIR Medical Education 2024;10:e57451 View
  14. Avidan Y, Tabachnikov V, Court O, Khoury R, Aker A. In the face of confounders: Atrial fibrillation detection – Practitioners vs. ChatGPT. Journal of Electrocardiology 2025;88:153851 View
  15. Shen B, Zong H, Wu R, Cha J, Wang J, Wu E, Li J, Zhou Y, Zhang C, Feng W. Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis (Preprint). Journal of Medical Internet Research 2024 View
  16. Schnapp B, Sehdev M, Schrepel C, Bord S, Pelletier‐Bui A, Alvarez A, Dubosh N, Park Y, Shappell E. ChatG‐PD? Comparing large language model artificial intelligence and faculty rankings of the competitiveness of standardized letters of evaluation. AEM Education and Training 2024;8(6) View
  17. Kuerbanjiang W, Peng S, Jiamaliding Y, Yi Y. Performance Evaluation of Large Language Models in Cervical Cancer Management Based on A Standardized Questionnaire: Comparative Study (Preprint). Journal of Medical Internet Research 2024 View
  18. Smollin K, Smollin C. Will Artificial Intelligence Replace the Medical Toxicologist: Pediatric Referral Thresholds Generated by GPT-4. Journal of Medical Toxicology 2024 View