Published on in Vol 9 (2023)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/48002, first published .
Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

Journals

  1. Sallam M, Salim N, Barakat M, Al-Mahzoum K, Al-Tammemi A, Malaeb D, Hallit R, Hallit S. Assessing Health Students' Attitudes and Usage of ChatGPT in Jordan: Validation Study. JMIR Medical Education 2023;9:e48254 View
  2. Levkovich I, Elyoseph Z. Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study. JMIR Mental Health 2023;10:e51232 View
  3. Gobira M, Nakayama L, Moreira R, Andrade E, Regatieri C, Belfort Jr. R. Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation. Revista da Associação Médica Brasileira 2023;69(10) View
  4. Kaneda Y, Takahashi R, Kaneda U, Akashima S, Okita H, Misaki S, Yamashiro A, Ozaki A, Tanimoto T. Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination. Cureus 2023 View
  5. Wang Y, Chen T. ChatGPT surges ahead: GPT-4 has arrived in the arena of medical research. Journal of the Chinese Medical Association 2023;86(9):784 View
  6. Toyama Y, Harigai A, Abe M, Nagano M, Kawabata M, Seki Y, Takase K. Performance evaluation of ChatGPT, GPT-4, and Bard on the official board examination of the Japan Radiology Society. Japanese Journal of Radiology 2024;42(2):201 View
  7. Taloni A, Borselli M, Scarsi V, Rossi C, Coco G, Scorcia V, Giannaccare G. Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology. Scientific Reports 2023;13(1) View
  8. Kaneda Y, Namba M, Kaneda U, Tanimoto T. Artificial Intelligence in Childcare: Assessing the Performance and Acceptance of ChatGPT Responses. Cureus 2023 View
  9. Huang R, Lu K, Meaney C, Kemppainen J, Punnett A, Leung F. Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study. JMIR Medical Education 2023;9:e50514 View
  10. Miao J, Thongprayoon C, Garcia Valencia O, Krisanapan P, Sheikh M, Davis P, Mekraksakit P, Suarez M, Craici I, Cheungpasitporn W. Performance of ChatGPT on Nephrology Test Questions. Clinical Journal of the American Society of Nephrology 2024;19(1):35 View
  11. Gao Z, Li L, Ma S, Wang Q, Hemphill L, Xu R. Examining the Potential of ChatGPT on Biomedical Information Retrieval: Fact-Checking Drug-Disease Associations. Annals of Biomedical Engineering 2023 View
  12. Shimizu I, Kasai H, Shikino K, Araki N, Takahashi Z, Onodera M, Kimura Y, Tsukamoto T, Yamauchi K, Asahina M, Ito S, Kawakami E. Developing Medical Education Curriculum Reform Strategies to Address the Impact of Generative AI: Qualitative Study. JMIR Medical Education 2023;9:e53466 View
  13. Guillen-Grima F, Guillen-Aguinaga S, Guillen-Aguinaga L, Alas-Brun R, Onambele L, Ortega W, Montejo R, Aguinaga-Ontoso E, Barach P, Aguinaga-Ontoso I. Evaluating the Efficacy of ChatGPT in Navigating the Spanish Medical Residency Entrance Examination (MIR): Promising Horizons for AI in Clinical Medicine. Clinics and Practice 2023;13(6):1460 View
  14. Torres-Zegarra B, Rios-Garcia W, Ñaña-Cordova A, Arteaga-Cisneros K, Chalco X, Ordoñez M, Rios C, Godoy C, Quezada K, Gutierrez-Arratia J, Flores-Cohaila J. Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study. Journal of Educational Evaluation for Health Professions 2023;20:30 View
  15. Watari T, Takagi S, Sakaguchi K, Nishizaki Y, Shimizu T, Yamamoto Y, Tokuda Y. Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study. JMIR Medical Education 2023;9:e52202 View
  16. Abusoglu S, Serdar M, Unlu A, Abusoglu G. Comparison of three chatbots as an assistant for problem-solving in clinical laboratory. Clinical Chemistry and Laboratory Medicine (CCLM) 2024;62(7):1362 View
  17. Ohta K, Ohta S. The Performance of GPT-3.5, GPT-4, and Bard on the Japanese National Dentist Examination: A Comparison Study. Cureus 2023 View
  18. Sallam M, Al-Salahat K. Below average ChatGPT performance in medical microbiology exam compared to university students. Frontiers in Education 2023;8 View
  19. Morishita M, Fukuda H, Muraoka K, Nakamura T, Hayashi M, Yoshioka I, Ono K, Awano S. Evaluating GPT-4V’s performance in the Japanese national dental examination: A challenge explored. Journal of Dental Sciences 2024;19(3):1595 View
  20. Lanera C, Lorenzoni G, Barbieri E, Piras G, Magge A, Weissenbacher D, Donà D, Cantarutti L, Gonzalez-Hernandez G, Giaquinto C, Gregori D. Monitoring the Epidemiology of Otitis Using Free-Text Pediatric Medical Notes: A Deep Learning Approach. Journal of Personalized Medicine 2023;14(1):28 View
  21. Tangadulrat P, Sono S, Tangtrakulwanich B. Using ChatGPT for Clinical Practice and Medical Education: Cross-Sectional Survey of Medical Students’ and Physicians’ Perceptions. JMIR Medical Education 2023;9:e50658 View
  22. Miao J, Thongprayoon C, Suppadungsuk S, Garcia Valencia O, Qureshi F, Cheungpasitporn W. Ethical Dilemmas in Using AI for Academic Writing and an Example Framework for Peer Review in Nephrology Academia: A Narrative Review. Clinics and Practice 2023;14(1):89 View
  23. Roberts R, Ali S, Hutchings H, Dobbs T, Whitaker I. Comparative study of ChatGPT and human evaluators on the assessment of medical literature according to recognised reporting standards. BMJ Health & Care Informatics 2023;30(1):e100830 View
  24. Kollitsch L, Eredics K, Marszalek M, Rauchenwald M, Brookman-May S, Burger M, Körner-Riffard K, May M. How does artificial intelligence master urological board examinations? A comparative analysis of different Large Language Models’ accuracy and reliability in the 2022 In-Service Assessment of the European Board of Urology. World Journal of Urology 2024;42(1) View
  25. Odabashian R, Bastin D, Jones G, Manzoor M, Tangestaniapour S, Assad M, Lakhani S, Odabashian M, McGee S. Assessment of ChatGPT-3.5's Knowledge in Oncology: Comparative Study with ASCO-SEP Benchmarks. JMIR AI 2024;3:e50442 View
  26. Morjaria L, Burns L, Bracken K, Levinson A, Ngo Q, Lee M, Sibbald M. Examining the Efficacy of ChatGPT in Marking Short-Answer Assessments in an Undergraduate Medical Program. International Medical Education 2024;3(1):32 View
  27. Herrmann-Werner A, Festl-Wietek T, Holderried F, Herschbach L, Griewatz J, Masters K, Zipfel S, Mahling M. Assessing ChatGPT’s Mastery of Bloom’s Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study. Journal of Medical Internet Research 2024;26:e52113 View
  28. Meyer A, Riese J, Streichert T. Comparison of the Performance of GPT-3.5 and GPT-4 With That of Medical Students on the Written German Medical Licensing Examination: Observational Study. JMIR Medical Education 2024;10:e50965 View
  29. Li D, Kao Y, Tsai S, Bai Y, Yeh T, Chu C, Hsu C, Cheng S, Hsu T, Liang C, Su K. Comparing the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi‐center psychiatrists. Psychiatry and Clinical Neurosciences 2024;78(6):347 View
  30. Yamaguchi S, Morishita M, Fukuda H, Muraoka K, Nakamura T, Yoshioka I, Soh I, Ono K, Awano S. Evaluating the efficacy of leading large language models in the Japanese national dental hygienist examination: A comparative analysis of ChatGPT, Bard, and Bing Chat. Journal of Dental Sciences 2024 View
  31. Gordon M, Daniel M, Ajiboye A, Uraiby H, Xu N, Bartlett R, Hanson J, Haas M, Spadafore M, Grafton-Clarke C, Gasiea R, Michie C, Corral J, Kwan B, Dolmans D, Thammasitboon S. A scoping review of artificial intelligence in medical education: BEME Guide No. 84. Medical Teacher 2024;46(4):446 View
  32. Lee G, Hong D, Kim S, Kim J, Lee Y, Park S, Lee K. Comparison of the problem-solving performance of ChatGPT-3.5, ChatGPT-4, Bing Chat, and Bard for the Korean emergency medicine board examination question bank. Medicine 2024;103(9):e37325 View
  33. Nakao T, Miki S, Nakamura Y, Kikuchi T, Nomura Y, Hanaoka S, Yoshikawa T, Abe O. Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study. JMIR Medical Education 2024;10:e54393 View
  34. Lee Y, Kim S. Potential applications of ChatGPT in obstetrics and gynecology in Korea: a review article. Obstetrics & Gynecology Science 2024;67(2):153 View
  35. Funk P, Hoch C, Knoedler S, Knoedler L, Cotofana S, Sofo G, Bashiri Dezfouli A, Wollenberg B, Guntinas-Lichius O, Alfertshofer M. ChatGPT’s Response Consistency: A Study on Repeated Queries of Medical Examination Questions. European Journal of Investigation in Health, Psychology and Education 2024;14(3):657 View
  36. Noda M, Ueno T, Koshu R, Takaso Y, Shimada M, Saito C, Sugimoto H, Fushiki H, Ito M, Nomura A, Yoshizaki T. Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study. JMIR Medical Education 2024;10:e57054 View
  37. Nakajima N, Fujimori T, Furuya M, Kanie Y, Imai H, Kita K, Uemura K, Okada S. A Comparison Between GPT-3.5, GPT-4, and GPT-4V: Can the Large Language Model (ChatGPT) Pass the Japanese Board of Orthopaedic Surgery Examination?. Cureus 2024 View
  38. Şenoymak M, Erbatur N, Şenoymak İ, Fırat S. The Role of Artificial Intelligence in Endocrine Management: Assessing ChatGPT’s Responses to Prolactinoma Queries. Journal of Personalized Medicine 2024;14(4):330 View
  39. Aoki N, Miyagami T, Saita M, Naito T. AI Analysis of General Medicine in Japan: Present and Future Considerations. JMIR Formative Research 2024;8:e52566 View
  40. Sato H, Ogasawara K. ChatGPT (GPT-4) passed the Japanese National License Examination for Pharmacists in 2022, answering all items including those with diagrams: a descriptive study. Journal of Educational Evaluation for Health Professions 2024;21:4 View
  41. Zhu L, Mou W, Lai Y, Lin J, Luo P. Language and cultural bias in AI: comparing the performance of large language models developed in different countries on Traditional Chinese Medicine highlights the need for localized models. Journal of Translational Medicine 2024;22(1) View
  42. Żmudka K, Spychał A, Ochman B, Popowicz Ł, Piłat P, Jaroszewicz J. ChatGPT – a tool for assisted studying or a source of misleading medical information? AI performance on Polish Medical Final Examination. Annales Academiae Medicae Silesiensis 2024;78:94 View
  43. Takagi S, Koda M, Watari T. The Performance of ChatGPT-4V in Interpreting Images and Tables in the Japanese Medical Licensing Exam. JMIR Medical Education 2024;10:e54283 View
  44. Mayo-Yáñez M, Lechien J, Maria-Saibene A, Vaira L, Maniaci A, Chiesa-Estomba C. Examining the Performance of ChatGPT 3.5 and Microsoft Copilot in Otolaryngology: A Comparative Study with Otolaryngologists’ Evaluation. Indian Journal of Otolaryngology and Head & Neck Surgery 2024 View
  45. Wu J, Wu X, Qiu Z, Li M, Lin S, Zhang Y, Zheng Y, Yuan C, Yang J. Large language models leverage external knowledge to extend clinical insight beyond language boundaries. Journal of the American Medical Informatics Association 2024 View
  46. Noda M, Yoshimura H, Okubo T, Koshu R, Uchiyama Y, Nomura A, Ito M, Takumi Y. Feasibility of Multimodal Artificial Intelligence Using GPT-4 Vision for the Classification of Middle Ear Disease: Qualitative Study and Validation. JMIR AI 2024;3:e58342 View
  47. Hultberg P, Santandreu Calonge D, Kamalov F, Smail L, Mizuno T. Comparing and assessing four AI chatbots’ competence in economics. PLOS ONE 2024;19(5):e0297804 View
  48. Bharatha A, Ojeh N, Fazle Rabbi A, Campbell M, Krishnamurthy K, Layne-Yarde R, Kumar A, Springer D, Connell K, Majumder M. Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom’s Taxonomy. Advances in Medical Education and Practice 2024;Volume 15:393 View
  49. Yang W, Chan Y, Huang C, Chen T. Comparative analysis of GPT-3.5 and GPT-4.0 in Taiwan’s medical technologist certification: A study in artificial intelligence advancements. Journal of the Chinese Medical Association 2024;87(5):525 View
  50. Hirano Y, Hanaoka S, Nakao T, Miki S, Kikuchi T, Nakamura Y, Nomura Y, Yoshikawa T, Abe O. GPT-4 Turbo with Vision fails to outperform text-only GPT-4 Turbo in the Japan Diagnostic Radiology Board Examination. Japanese Journal of Radiology 2024 View
  51. Yavuz Y, Kahraman F. Evaluation of the prediagnosis and management of ChatGPT-4.0 in clinical cases in cardiology. Future Cardiology 2024:1 View
  52. Shikino K, Shimizu T, Otsuka Y, Tago M, Takahashi H, Watari T, Sasaki Y, Iizuka G, Tamura H, Nakashima K, Kunitomo K, Suzuki M, Aoyama S, Kosaka S, Kawahigashi T, Matsumoto T, Orihara F, Morikawa T, Nishizawa T, Hoshina Y, Yamamoto Y, Matsuo Y, Unoki Y, Kimura H, Tokushima M, Watanuki S, Saito T, Otsuka F, Tokuda Y. Evaluation of ChatGPT-Generated Differential Diagnosis for Common Diseases With Atypical Presentation: Descriptive Research. JMIR Medical Education 2024;10:e58758 View
  53. Kaneda Y, Tayuinosho A, Tomoyose R, Takita M, Hamaki T, Tanimoto T, Ozaki A. Evaluating ChatGPT's effectiveness and tendencies in Japanese internal medicine. Journal of Evaluation in Clinical Practice 2024 View
  54. Igarashi Y, Nakahara K, Norii T, Miyake N, Tagami T, Yokobori S. Performance of a Large Language Model on Japanese Emergency Medicine Board Certification Examinations. Journal of Nippon Medical School 2024;91(2):155 View
  55. Mousavi M, Shafiee S, Harley J, Cheung J, Abbasgholizadeh Rahimi S. Performance of generative pre-trained transformers (GPTs) in Certification Examination of the College of Family Physicians of Canada. Family Medicine and Community Health 2024;12(Suppl 1):e002626 View
  56. Yaïci R, Cieplucha M, Bock R, Moayed F, Bechrakis N, Berens P, Feltgen N, Friedburg D, Gräf M, Guthoff R, Hoffmann E, Hoerauf H, Hintschich C, Kohnen T, Messmer E, Nentwich M, Pleyer U, Schaudig U, Seitz B, Geerling G, Roth M. ChatGPT und die deutsche Facharztprüfung für Augenheilkunde: eine Evaluierung. Die Ophthalmologie 2024 View
  57. Meyer A, Soleman A, Riese J, Streichert T. Comparison of ChatGPT, Gemini, and Le Chat with physician interpretations of medical laboratory questions from an online health forum. Clinical Chemistry and Laboratory Medicine (CCLM) 2024;0(0) View
  58. Bhattaru A, Yanamala N, Sengupta P. Revolutionizing Cardiology with Words: Unveiling the Impact of Large Language Models in Medical Science Writing. Canadian Journal of Cardiology 2024 View
  59. Griewing S, Knitza J, Boekhoff J, Hillen C, Lechner F, Wagner U, Wallwiener M, Kuhn S. Evolution of publicly available large language models for complex decision-making in breast cancer care. Archives of Gynecology and Obstetrics 2024;310(1):537 View
  60. Harada Y, Suzuki T, Harada T, Sakamoto T, Ishizuka K, Miyagami T, Kawamura R, Kunitomo K, Nagano H, Shimizu T, Watari T. Performance evaluation of ChatGPT in detecting diagnostic errors and their contributing factors: an analysis of 545 case reports of diagnostic errors. BMJ Open Quality 2024;13(2):e002654 View
  61. Liu M, Okuhara T, Chang X, Shirabe R, Nishiie Y, Okada H, Kiuchi T. Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: A Systematic Review and Meta-Analysis (Preprint). Journal of Medical Internet Research 2024 View
  62. Suwała S, Szulc P, Guzowski C, Kamińska B, Dorobiała J, Wojciechowska K, Berska M, Kubicka O, Kosturkiewicz O, Kosztulska B, Rajewska A, Junik R. ChatGPT-3.5 passes Poland’s medical final examination—Is it possible for ChatGPT to become a doctor in Poland?. SAGE Open Medicine 2024;12 View
  63. Ando K, Sato M, Wakatsuki S, Nagai R, Chino K, Kai H, Sasaki T, Kato R, Nguyen T, Guo N, Sultan P. A comparative study of English and Japanese ChatGPT responses to anaesthesia-related medical questions. BJA Open 2024;10:100296 View
  64. Ming S, Guo Q, Cheng W, Lei B. Model Evolution and System Roles Influence the Performance of ChatGPT on Chinese Medical Licensing Exams: A Comparative Study (Preprint). JMIR Medical Education 2023 View

Books/Policy Documents

  1. Huang D, Wang Z. Trends and Applications in Knowledge Discovery and Data Mining. View