Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study

doi:10.2196/55048

Published on 29.Apr.2024 in Vol 10 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/55048, first published 30.Nov.2023.

Person using a laptop displaying the ChatGPT interface with examples, capabilities, and limitations.

Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study

Marcos Rojas¹

; Marcelo Rojas²

; Valentina Burgess²

; Javier Toro-Pérez²

; Shima Salehi¹

Article Authors Cited by (28) Tweetations (3) Metrics

Journals

Liu M, Okuhara T, Chang X, Shirabe R, Nishiie Y, Okada H, Kiuchi T. Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis. Journal of Medical Internet Research 2024;26:e60807 View
Tong W, Zhang X, Zeng H, Pan J, Gong C, Zhang H. Reforming China’s Secondary Vocational Medical Education: Adapting to the Challenges and Opportunities of the AI Era. JMIR Medical Education 2024;10:e48594 View
Kipp M. From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance. Information 2024;15(9):543 View
LEVENTOGLU E, SORAN M. Clinical Characteristics of Children with Acute Post-Streptococcal Glomerulonephritis and Re-Evaluation of Patients with Artificial Intelligence. Medeniyet Medical Journal 2024 View
Nakaura T, Yoshida N, Kobayashi N, Nagayama Y, Uetani H, Kidoh M, Oda S, Funama Y, Hirai T. Performance of Multimodal Large Language Models in Japanese Diagnostic Radiology Board Examinations (2021-2023). Academic Radiology 2025;32(5):2394 View
Kim J, Vajravelu B. Assessing the Current Limitations of Large Language Models in Advancing Health Care Education. JMIR Formative Research 2025;9:e51319 View
Qiu Y, Liu C. Capable exam-taker and question-generator: the dual role of generative AI in medical education assessment. Global Medical Education 2025;2(1):135 View
Nguyen H, Dang H, Nguyen T, Hoang V, Nguyen V, Wu J. Accuracy of latest large language models in answering multiple choice questions in dentistry: A comparative study. PLOS ONE 2025;20(1):e0317423 View
Zhao Q, Wang H, Wang R, Cao H. Deriving insights from enhanced accuracy: Leveraging prompt engineering in custom GPT for assessing Chinese Nursing Licensing Exam. Nurse Education in Practice 2025;84:104284 View
Wang J, Shue K, Liu L, Hu G. Preliminary evaluation of ChatGPT model iterations in emergency department diagnostics. Scientific Reports 2025;15(1) View
Rodrigues Alessi M, Gomes H, Oliveira G, Lopes de Castro M, Grenteski F, Miyashiro L, do Valle C, Tozzini Tavares da Silva L, Okamoto C. Comparative Performance of Medical Students, ChatGPT-3.5 and ChatGPT-4.0 in Answering Questions From a Brazilian National Medical Exam: Cross-Sectional Questionnaire Study. JMIR AI 2025;4:e66552 View
Altermatt F, Neyem A, Sumonte N, Mendoza M, Villagran I, Lacassie H. Performance of single-agent and multi-agent language models in Spanish language medical competency exams. BMC Medical Education 2025;25(1) View
Wang L, Li J, Zhuang B, Huang S, Fang M, Wang C, Li W, Zhang M, Gong S. Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis. Journal of Medical Internet Research 2025;27:e64486 View
Park C, An M, Hwang G, Park R, An J. Clinical Performance and Communication Skills of ChatGPT Versus Physicians in Emergency Medicine: Simulated Patient Study. JMIR Medical Informatics 2025;13:e68409 View
Wu H, Zerner T, Lee D, Court-Kowalski S, Devitt P, Palmer E. GPT-4 versus human authors in clinically complex MCQ creation: A blinded analysis of item quality. Medical Teacher 2025;47(12):1961 View
Cheng Y, Zhu L. A review of ChatGPT in medical education: exploring advantages and limitations. International Journal of Surgery 2025;111(7):4586 View
Boral K, Mondal K. Comparing AI-Generated Responses: A Study on ChatGPT, Gemini, and Copilot in Education. Journal of Educational Technology Systems 2025;54(2):291 View
Nakaura T, Uetani H, Yoshida N, Kobayashi N, Nagayama Y, Kidoh M, Kuroda J, Mukasa A, Hirai T. Intra-axial primary brain tumor differentiation: comparing large language models on structured MRI reports vs. radiologists on images. European Radiology 2025;36(2):1594 View
Saowaprut P, Wabina R, Yang J, Siriwat L. Performance of large language models on Thailand’s national medical licensing examination: a cross-sectional study. Journal of Educational Evaluation for Health Professions 2025;22:16 View
Latkowska A, Sawina P, Dolata T, Boczkowski D, Wielochowska A, Kowalczyk A, Loson-Kawalec M, Radej D, Jaworski W, Majchrowicz W, Olender M, Adamiak J, Sroczynska J, Suleiman R, Glinska J, Szczerbanowicz P, Dadynska P. Assessment of the Ability of the ChatGPT-5 Model to Pass the Endocrinology Specialization Exam. Cureus 2025 View
Kasagga A, Sapkota A, Changaramkumarath G, Abucha J, Wollel M, Somannagari N, Husami M, Hailu K, Kasagga E. Performance of ChatGPT and Large Language Models on Medical Licensing Exams Worldwide: A Systematic Review and Network Meta-Analysis With Meta-Regression. Cureus 2025 View
Ayala-Carabajo R, Llerena-Izquierdo J. Comparative Experimental Studies on Superior Cognitive Domains: AI Versus Humans. Technologies 2026;14(1):55 View
Lian L, Luo X, Chipusu K, Ashraf M, Wong K, Zhang W. Large Language Models Evaluation of Medical Licensing Examination Using GPT-4.0, ERNIE Bot 4.0, and GPT-4o. Bioengineering 2026;13(1):113 View
Aliyeva A, Muradova A, Hashimli R, Müderris T. Multi‐model Artificial Intelligence Evaluation in Sudden Sensorineural Hearing Loss. Otolaryngology–Head and Neck Surgery 2026;174(4):980 View
Geduk G, Hasırcı U, Kusay D, Aras R, Çapar İ, Altın E, Şeker Ç. A comparative analysis of the performance of large Language models in the dentistry specialty examination. Scientific Reports 2026;16(1) View
Liu X, Wang H, Guo X, Tian S, Cao H. Comparative Performance of DeepSeek and ChatGPT-4o in the Chinese Nursing Licensing Exam. Journal of Nursing Education 2026;65(2):69 View
Lokadjaja M, Kho J, Schulz P, Goh W. Large Language Models and Their Applications in Mental Health: Scoping Review. JMIR Mental Health 2026;13:e88057 View

Books/Policy Documents

Pérez G, Gamboa A. The Second International Symposium on Generative AI and Education (ISGAIE’2025). View

Citation

Please cite as:

Rojas M, Rojas M, Burgess V, Toro-Pérez J, Salehi S
Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study
JMIR Med Educ 2024;10:e55048
doi: 10.2196/55048 PMID: 38686550 PMCID: 11082432

Export Metadata

END for: Endnote

BibTeX for: BibDesk, LaTeX

RIS for: RefMan, Procite, Endnote, RefWorks

Add this article to your Mendeley library

This paper is in the following e-collection/theme issue:

Theme Issue: ChatGPT and Generative Language Models in Medical Education (144) Testing and Assessment in Medical Education (204) Chatbots and Conversational Agents (1145) Generative Language Models Including ChatGPT (1444)

Download

Download PDF Download XML

Share Article

Share on Bluesky Share on Twitter Share on Facebook Share on LinkedIn