Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study

doi:10.2196/54393

Journals

Koga S, Du W, Ono D. Response to “Can ChatGPT Vision diagnose melanoma? An exploratory diagnostic accuracy study.”. Journal of the American Academy of Dermatology 2024;91(3):e61 View
Liu M, Okuhara T, Chang X, Shirabe R, Nishiie Y, Okada H, Kiuchi T. Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis. Journal of Medical Internet Research 2024;26:e60807 View
Ishida K, Hanada E. Potential of ChatGPT to Pass the Japanese Medical and Healthcare Professional National Licenses: A Literature Review. Cureus 2024 View
Tong W, Zhang X, Zeng H, Pan J, Gong C, Zhang H. Reforming China’s Secondary Vocational Medical Education: Adapting to the Challenges and Opportunities of the AI Era. JMIR Medical Education 2024;10:e48594 View
Liu C, Ho C, Wu T. Custom GPTs Enhancing Performance and Evidence Compared with GPT-3.5, GPT-4, and GPT-4o? A Study on the Emergency Medicine Specialist Examination. Healthcare 2024;12(17):1726 View
Kipp M. From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance. Information 2024;15(9):543 View
Brin D, Sorin V, Konen E, Nadkarni G, Glicksberg B, Klang E. How GPT models perform on the United States medical licensing examination: a systematic review. Discover Applied Sciences 2024;6(10) View
Okada A. Editorial Comment on Can artificial intelligence pass the Japanese urology board examinations?. International Journal of Urology 2024;31(12):1442 View
Morishita M, Fukuda H, Yamaguchi S, Muraoka K, Nakamura T, Hayashi M, Yoshioka I, Ono K, Awano S. An exploratory assessment of GPT-4o and GPT-4 performance on the Japanese National Dental Examination. The Saudi Dental Journal 2024;36(12):1577 View
Zong H, Wu R, Cha J, Wang J, Wu E, Li J, Zhou Y, Zhang C, Feng W, Shen B. Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis. Journal of Medical Internet Research 2024;26:e66114 View
Güneş Y, Ülkir M. Comparative Performance Evaluation of Multimodal Large Language Models, Radiologist, and Anatomist in Visual Neuroanatomy Questions. Uludağ Üniversitesi Tıp Fakültesi Dergisi 2025;50(3):551 View
Schramm S, Preis S, Metz M, Jung K, Schmitz-Koep B, Zimmer C, Wiestler B, Hedderich D, Kim S. Impact of Multimodal Prompt Elements on Diagnostic Performance of GPT-4V in Challenging Brain MRI Cases. Radiology 2025;314(1) View
Nguyen H, Dang H, Nguyen T, Hoang V, Nguyen V, Wu J. Accuracy of latest large language models in answering multiple choice questions in dentistry: A comparative study. PLOS ONE 2025;20(1):e0317423 View
Scherr R, Spina A, Dao A, Andalib S, Halaseh F, Blair S, Wiechmann W, Rivera R. Novel Evaluation Metric and Quantified Performance of ChatGPT-4 Patient Management Simulations for Early Clinical Education: Experimental Study. JMIR Formative Research 2025;9:e66478 View
Yang Z, Yao Z, Tasmin M, Vashisht P, Jang W, Ouyang F, Wang B, McManus D, Berlowitz D, Yu H. Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study. Journal of Medical Internet Research 2025;27:e65146 View
Mine Y, Taji T, Okazaki S, Takeda S, Peng T, Shimoe S, Kaku M, Nikawa H, Kakimoto N, Murayama T. Analyzing the performance of multimodal large language models on visually-based questions in the Japanese National Examination for Dental Technicians. Journal of Dental Sciences 2025;20(4):2460 View
Luo D, Liu M, Yu R, Liu Y, Jiang W, Fan Q, Kuang N, Gao Q, Yin T, Zheng Z. Evaluating the performance of GPT-3.5, GPT-4, and GPT-4o in the Chinese National Medical Licensing Examination. Scientific Reports 2025;15(1) View
Hamada M, Kikuchi S, Akitomo T, Kusaka S, Iwamoto Y, Nomura R. Applications and potential of ChatGPT in dentistry: Scoping review of research perspectives. Journal of Dental Sciences 2026;21(1):1 View
Liu Z, Zhao C, Lin H. DeepSeek-R1 vs Open-Weight AI in Ophthalmology. JAMA Ophthalmology 2025;143(10):842 View
Wang T, Jheng J, Tseng Y, Chen L, Chen Y. Evaluating GPT-4’s visual interpretation and clinical reasoning on emergency settings: A 5-year analysis. Journal of the Chinese Medical Association 2025;88(9):672 View
Jaleel A, Aziz U, Farid G, Zahid Bashir M, Mirza T, Khizar Abbas S, Aslam S, Sikander R. Evaluating the Potential and Accuracy of ChatGPT-3.5 and 4.0 in Medical Licensing and In-Training Examinations: Systematic Review and Meta-Analysis. JMIR Medical Education 2025;11:e68070 View
Saowaprut P, Wabina R, Yang J, Siriwat L. Performance of large language models on Thailand’s national medical licensing examination: a cross-sectional study. Journal of Educational Evaluation for Health Professions 2025;22:16 View
Kim H, Jung K, Shin S, Lee W, Lee J, Park H, Choi Q. Performance evaluation of large language models on Korean medical licensing examination: a three-year comparative analysis. Scientific Reports 2025;15(1) View
Kasagga A, Sapkota A, Changaramkumarath G, Abucha J, Wollel M, Somannagari N, Husami M, Hailu K, Kasagga E. Performance of ChatGPT and Large Language Models on Medical Licensing Exams Worldwide: A Systematic Review and Network Meta-Analysis With Meta-Regression. Cureus 2025 View
Nguyen V, Vuong T, Nguyen V, Ma, H. Benchmarking large-language-model vision capabilities in oral and maxillofacial anatomy: A cross-sectional study. PLOS One 2025;20(10):e0335775 View
Engelstein H, Ramon-Gonen R, Barbash I, Beinart R, Cohen-Shelly M, Sabbag A. Estimating LVEF from ECG with GPT-4o Fine-Tuned Vision: A Novel Approach in AI-Driven Cardiac Diagnostics. Journal of Medical Systems 2025;49(1) View
Saita K, Mine Y, Amano S. What the performance of multimodal LLMs on a national licensing exam teaches us about occupational therapy education. BMC Medical Education 2026;26(1) View
Miyamura M, Fujiki G, Kanzaki Y, Tsuda K, Asano H, Morita H, Hoshiga M. Evaluating Chat GPT-4o’s Comparative Performance over GPT-4 in Japanese Medical Licensing Examination and Its Clinical Partnership Potential. International Medical Education 2026;5(1):9 View
Zouakia Z, Logak E, Szymczak A, Jais J, Burgun A, Tsopra R. AI-Driven Objective Structured Clinical Examination Generation in Digital Health Education: Comparative Analysis of Three GPT-4o Configurations. JMIR Medical Education 2026;12:e82116 View
Jung J, Kim H, Bae S, Park J. Comparative analysis of multimodal large language models GPT-4o and o1 versus clinicians in clinical case challenge questions: Retrospective cross-sectional study. Medicine 2026;105(4):e47071 View
Geduk G, Hasırcı U, Kusay D, Aras R, Çapar İ, Altın E, Şeker Ç. A comparative analysis of the performance of large Language models in the dentistry specialty examination. Scientific Reports 2026;16(1) View
Kottlors J, Iuga A, Bluethgen C, Bressem K, Kather J, Moy L, Wald C, Wang W, Liu T, Ranschaert E, Dratsch T, Kleesiek J, Gertz R, Rajpurkar P, Bedayat A, Fink M, Zeeck A, Chaudhari A, Alkasab T, Wu H, Nensa F, Wang B, Große Hokamp N, Laukamp K, Persigehl T, Maintz D, Truhn D, Lennartz S. Guidelines for Reporting Studies on Large Language Models in Radiology: An International Delphi Expert Survey. Radiology 2026;318(2) View
Liu M, Okuhara T, Dai Z, Zhao M, Yin W, Okada H, Furukawa E, Kiuchi T. Textbook-level medical knowledge in large language models: comparative evaluation using Japanese National Medical Examination. BMC Medical Informatics and Decision Making 2026;26(1) View

Conference Proceedings

Xu M, Ye C, Zeng Z, Chang C, Qi S, Wu Y, Yang H, Chen Y, Huang H, Liu L, Cao Z, Deng X. 2024 IEEE International Conference on Digital Health (ICDH). Adopting Generative AI with Precaution in Dentistry: A Review and Reflection View
Wu G, Cheng C, Pang T. 2024 IEEE International Conference on Future Machine Learning and Data Science (FMLDS). Defect Classification and Localization in Material Extrusion with Multi-Modal Large Language Models View
Wang J, Zheng M, Qin Q. 2025 5th International Conference on Educational Technology (ICET). A Bibliometric Analysis of Knowledge Mapping on Artificial Intelligence Image Technology Applications in Education View

This paper is in the following e-collection/theme issue:

Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study

Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study

Journals

Conference Proceedings