Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

doi:10.2196/48002

Published on 29.Jun.2023 in Vol 9 (2023)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/48002, first published 07.Apr.2023.

Young man in lab coat and goggles with glowing blue aura

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

Soshi Takagi¹

; Takashi Watari^{1, 2, 3, 4}

; Ayano Erabi¹

; Kota Sakaguchi²

Article Authors Cited by (227) Tweetations (28) Metrics

Journals

Sallam M, Salim N, Barakat M, Al-Mahzoum K, Al-Tammemi A, Malaeb D, Hallit R, Hallit S. Assessing Health Students' Attitudes and Usage of ChatGPT in Jordan: Validation Study. JMIR Medical Education 2023;9:e48254 View
Levkovich I, Elyoseph Z. Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study. JMIR Mental Health 2023;10:e51232 View
Gobira M, Nakayama L, Moreira R, Andrade E, Regatieri C, Belfort Jr. R. Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation. Revista da Associação Médica Brasileira 2023;69(10) View
Kaneda Y, Takahashi R, Kaneda U, Akashima S, Okita H, Misaki S, Yamashiro A, Ozaki A, Tanimoto T. Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination. Cureus 2023 View
Wang Y, Chen T. ChatGPT surges ahead: GPT-4 has arrived in the arena of medical research. Journal of the Chinese Medical Association 2023;86(9):784 View
Toyama Y, Harigai A, Abe M, Nagano M, Kawabata M, Seki Y, Takase K. Performance evaluation of ChatGPT, GPT-4, and Bard on the official board examination of the Japan Radiology Society. Japanese Journal of Radiology 2024;42(2):201 View
Taloni A, Borselli M, Scarsi V, Rossi C, Coco G, Scorcia V, Giannaccare G. Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology. Scientific Reports 2023;13(1) View
Kaneda Y, Namba M, Kaneda U, Tanimoto T. Artificial Intelligence in Childcare: Assessing the Performance and Acceptance of ChatGPT Responses. Cureus 2023 View
Huang R, Lu K, Meaney C, Kemppainen J, Punnett A, Leung F. Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study. JMIR Medical Education 2023;9:e50514 View
Miao J, Thongprayoon C, Garcia Valencia O, Krisanapan P, Sheikh M, Davis P, Mekraksakit P, Suarez M, Craici I, Cheungpasitporn W. Performance of ChatGPT on Nephrology Test Questions. Clinical Journal of the American Society of Nephrology 2024;19(1):35 View
Gao Z, Li L, Ma S, Wang Q, Hemphill L, Xu R. Examining the Potential of ChatGPT on Biomedical Information Retrieval: Fact-Checking Drug-Disease Associations. Annals of Biomedical Engineering 2024;52(8):1919 View
Shimizu I, Kasai H, Shikino K, Araki N, Takahashi Z, Onodera M, Kimura Y, Tsukamoto T, Yamauchi K, Asahina M, Ito S, Kawakami E. Developing Medical Education Curriculum Reform Strategies to Address the Impact of Generative AI: Qualitative Study. JMIR Medical Education 2023;9:e53466 View
Guillen-Grima F, Guillen-Aguinaga S, Guillen-Aguinaga L, Alas-Brun R, Onambele L, Ortega W, Montejo R, Aguinaga-Ontoso E, Barach P, Aguinaga-Ontoso I. Evaluating the Efficacy of ChatGPT in Navigating the Spanish Medical Residency Entrance Examination (MIR): Promising Horizons for AI in Clinical Medicine. Clinics and Practice 2023;13(6):1460 View
Torres-Zegarra B, Rios-Garcia W, Ñaña-Cordova A, Arteaga-Cisneros K, Chalco X, Ordoñez M, Rios C, Godoy C, Quezada K, Gutierrez-Arratia J, Flores-Cohaila J. Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study. Journal of Educational Evaluation for Health Professions 2023;20:30 View
Watari T, Takagi S, Sakaguchi K, Nishizaki Y, Shimizu T, Yamamoto Y, Tokuda Y. Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study. JMIR Medical Education 2023;9:e52202 View
Abusoglu S, Serdar M, Unlu A, Abusoglu G. Comparison of three chatbots as an assistant for problem-solving in clinical laboratory. Clinical Chemistry and Laboratory Medicine (CCLM) 2024;62(7):1362 View
Ohta K, Ohta S. The Performance of GPT-3.5, GPT-4, and Bard on the Japanese National Dentist Examination: A Comparison Study. Cureus 2023 View
Sallam M, Al-Salahat K. Below average ChatGPT performance in medical microbiology exam compared to university students. Frontiers in Education 2023;8 View
Morishita M, Fukuda H, Muraoka K, Nakamura T, Hayashi M, Yoshioka I, Ono K, Awano S. Evaluating GPT-4V’s performance in the Japanese national dental examination: A challenge explored. Journal of Dental Sciences 2024;19(3):1595 View
Lanera C, Lorenzoni G, Barbieri E, Piras G, Magge A, Weissenbacher D, Donà D, Cantarutti L, Gonzalez-Hernandez G, Giaquinto C, Gregori D. Monitoring the Epidemiology of Otitis Using Free-Text Pediatric Medical Notes: A Deep Learning Approach. Journal of Personalized Medicine 2023;14(1):28 View
Tangadulrat P, Sono S, Tangtrakulwanich B. Using ChatGPT for Clinical Practice and Medical Education: Cross-Sectional Survey of Medical Students’ and Physicians’ Perceptions. JMIR Medical Education 2023;9:e50658 View
Miao J, Thongprayoon C, Suppadungsuk S, Garcia Valencia O, Qureshi F, Cheungpasitporn W. Ethical Dilemmas in Using AI for Academic Writing and an Example Framework for Peer Review in Nephrology Academia: A Narrative Review. Clinics and Practice 2023;14(1):89 View
Roberts R, Ali S, Hutchings H, Dobbs T, Whitaker I. Comparative study of ChatGPT and human evaluators on the assessment of medical literature according to recognised reporting standards. BMJ Health & Care Informatics 2023;30(1):e100830 View
Kollitsch L, Eredics K, Marszalek M, Rauchenwald M, Brookman-May S, Burger M, Körner-Riffard K, May M. How does artificial intelligence master urological board examinations? A comparative analysis of different Large Language Models’ accuracy and reliability in the 2022 In-Service Assessment of the European Board of Urology. World Journal of Urology 2024;42(1) View
Odabashian R, Bastin D, Jones G, Manzoor M, Tangestaniapour S, Assad M, Lakhani S, Odabashian M, McGee S. Assessment of ChatGPT-3.5's Knowledge in Oncology: Comparative Study with ASCO-SEP Benchmarks. JMIR AI 2024;3:e50442 View
Morjaria L, Burns L, Bracken K, Levinson A, Ngo Q, Lee M, Sibbald M. Examining the Efficacy of ChatGPT in Marking Short-Answer Assessments in an Undergraduate Medical Program. International Medical Education 2024;3(1):32 View
Herrmann-Werner A, Festl-Wietek T, Holderried F, Herschbach L, Griewatz J, Masters K, Zipfel S, Mahling M. Assessing ChatGPT’s Mastery of Bloom’s Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study. Journal of Medical Internet Research 2024;26:e52113 View
Meyer A, Riese J, Streichert T. Comparison of the Performance of GPT-3.5 and GPT-4 With That of Medical Students on the Written German Medical Licensing Examination: Observational Study. JMIR Medical Education 2024;10:e50965 View
Li D, Kao Y, Tsai S, Bai Y, Yeh T, Chu C, Hsu C, Cheng S, Hsu T, Liang C, Su K. Comparing the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi‐center psychiatrists. Psychiatry and Clinical Neurosciences 2024;78(6):347 View
Yamaguchi S, Morishita M, Fukuda H, Muraoka K, Nakamura T, Yoshioka I, Soh I, Ono K, Awano S. Evaluating the efficacy of leading large language models in the Japanese national dental hygienist examination: A comparative analysis of ChatGPT, Bard, and Bing Chat. Journal of Dental Sciences 2024;19(4):2262 View
Gordon M, Daniel M, Ajiboye A, Uraiby H, Xu N, Bartlett R, Hanson J, Haas M, Spadafore M, Grafton-Clarke C, Gasiea R, Michie C, Corral J, Kwan B, Dolmans D, Thammasitboon S. A scoping review of artificial intelligence in medical education: BEME Guide No. 84. Medical Teacher 2024;46(4):446 View
Lee G, Hong D, Kim S, Kim J, Lee Y, Park S, Lee K. Comparison of the problem-solving performance of ChatGPT-3.5, ChatGPT-4, Bing Chat, and Bard for the Korean emergency medicine board examination question bank. Medicine 2024;103(9):e37325 View
Nakao T, Miki S, Nakamura Y, Kikuchi T, Nomura Y, Hanaoka S, Yoshikawa T, Abe O. Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study. JMIR Medical Education 2024;10:e54393 View
Lee Y, Kim S. Potential applications of ChatGPT in obstetrics and gynecology in Korea: a review article. Obstetrics & Gynecology Science 2024;67(2):153 View
Funk P, Hoch C, Knoedler S, Knoedler L, Cotofana S, Sofo G, Bashiri Dezfouli A, Wollenberg B, Guntinas-Lichius O, Alfertshofer M. ChatGPT’s Response Consistency: A Study on Repeated Queries of Medical Examination Questions. European Journal of Investigation in Health, Psychology and Education 2024;14(3):657 View
Noda M, Ueno T, Koshu R, Takaso Y, Shimada M, Saito C, Sugimoto H, Fushiki H, Ito M, Nomura A, Yoshizaki T. Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study. JMIR Medical Education 2024;10:e57054 View
Nakajima N, Fujimori T, Furuya M, Kanie Y, Imai H, Kita K, Uemura K, Okada S. A Comparison Between GPT-3.5, GPT-4, and GPT-4V: Can the Large Language Model (ChatGPT) Pass the Japanese Board of Orthopaedic Surgery Examination?. Cureus 2024 View
Şenoymak M, Erbatur N, Şenoymak İ, Fırat S. The Role of Artificial Intelligence in Endocrine Management: Assessing ChatGPT’s Responses to Prolactinoma Queries. Journal of Personalized Medicine 2024;14(4):330 View
Aoki N, Miyagami T, Saita M, Naito T. AI Analysis of General Medicine in Japan: Present and Future Considerations. JMIR Formative Research 2024;8:e52566 View
Sato H, Ogasawara K. ChatGPT (GPT-4) passed the Japanese National License Examination for Pharmacists in 2022, answering all items including those with diagrams: a descriptive study. Journal of Educational Evaluation for Health Professions 2024;21:4 View
Zhu L, Mou W, Lai Y, Lin J, Luo P. Language and cultural bias in AI: comparing the performance of large language models developed in different countries on Traditional Chinese Medicine highlights the need for localized models. Journal of Translational Medicine 2024;22(1) View
Żmudka K, Spychał A, Ochman B, Popowicz Ł, Piłat P, Jaroszewicz J. ChatGPT – a tool for assisted studying or a source of misleading medical information? AI performance on Polish Medical Final Examination. Annales Academiae Medicae Silesiensis 2024;78:94 View
Takagi S, Koda M, Watari T. The Performance of ChatGPT-4V in Interpreting Images and Tables in the Japanese Medical Licensing Exam. JMIR Medical Education 2024;10:e54283 View
Mayo-Yáñez M, Lechien J, Maria-Saibene A, Vaira L, Maniaci A, Chiesa-Estomba C. Examining the Performance of ChatGPT 3.5 and Microsoft Copilot in Otolaryngology: A Comparative Study with Otolaryngologists’ Evaluation. Indian Journal of Otolaryngology and Head & Neck Surgery 2024;76(4):3465 View
Wu J, Wu X, Qiu Z, Li M, Lin S, Zhang Y, Zheng Y, Yuan C, Yang J. Large language models leverage external knowledge to extend clinical insight beyond language boundaries. Journal of the American Medical Informatics Association 2024;31(9):2054 View
Noda M, Yoshimura H, Okubo T, Koshu R, Uchiyama Y, Nomura A, Ito M, Takumi Y. Feasibility of Multimodal Artificial Intelligence Using GPT-4 Vision for the Classification of Middle Ear Disease: Qualitative Study and Validation. JMIR AI 2024;3:e58342 View
Hultberg P, Santandreu Calonge D, Kamalov F, Smail L, Mizuno T. Comparing and assessing four AI chatbots’ competence in economics. PLOS ONE 2024;19(5):e0297804 View
Bharatha A, Ojeh N, Fazle Rabbi A, Campbell M, Krishnamurthy K, Layne-Yarde R, Kumar A, Springer D, Connell K, Majumder M. Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom’s Taxonomy. Advances in Medical Education and Practice 2024;Volume 15:393 View
Yang W, Chan Y, Huang C, Chen T. Comparative analysis of GPT-3.5 and GPT-4.0 in Taiwan’s medical technologist certification: A study in artificial intelligence advancements. Journal of the Chinese Medical Association 2024;87(5):525 View
Hirano Y, Hanaoka S, Nakao T, Miki S, Kikuchi T, Nakamura Y, Nomura Y, Yoshikawa T, Abe O. GPT-4 Turbo with Vision fails to outperform text-only GPT-4 Turbo in the Japan Diagnostic Radiology Board Examination. Japanese Journal of Radiology 2024;42(8):918 View
Yavuz Y, Kahraman F. Evaluation of the prediagnosis and management of ChatGPT-4.0 in clinical cases in cardiology. Future Cardiology 2024;20(4):197 View
Shikino K, Shimizu T, Otsuka Y, Tago M, Takahashi H, Watari T, Sasaki Y, Iizuka G, Tamura H, Nakashima K, Kunitomo K, Suzuki M, Aoyama S, Kosaka S, Kawahigashi T, Matsumoto T, Orihara F, Morikawa T, Nishizawa T, Hoshina Y, Yamamoto Y, Matsuo Y, Unoki Y, Kimura H, Tokushima M, Watanuki S, Saito T, Otsuka F, Tokuda Y. Evaluation of ChatGPT-Generated Differential Diagnosis for Common Diseases With Atypical Presentation: Descriptive Research. JMIR Medical Education 2024;10:e58758 View
Kaneda Y, Tayuinosho A, Tomoyose R, Takita M, Hamaki T, Tanimoto T, Ozaki A. Evaluating ChatGPT's effectiveness and tendencies in Japanese internal medicine. Journal of Evaluation in Clinical Practice 2024;30(6):1017 View
Igarashi Y, Nakahara K, Norii T, Miyake N, Tagami T, Yokobori S. Performance of a Large Language Model on Japanese Emergency Medicine Board Certification Examinations. Journal of Nippon Medical School 2024;91(2):155 View
Mousavi M, Shafiee S, Harley J, Cheung J, Abbasgholizadeh Rahimi S. Performance of generative pre-trained transformers (GPTs) in Certification Examination of the College of Family Physicians of Canada. Family Medicine and Community Health 2024;12(Suppl 1):e002626 View
Yaïci R, Cieplucha M, Bock R, Moayed F, Bechrakis N, Berens P, Feltgen N, Friedburg D, Gräf M, Guthoff R, Hoffmann E, Hoerauf H, Hintschich C, Kohnen T, Messmer E, Nentwich M, Pleyer U, Schaudig U, Seitz B, Geerling G, Roth M. ChatGPT und die deutsche Facharztprüfung für Augenheilkunde: eine Evaluierung. Die Ophthalmologie 2024;121(7):554 View
Meyer A, Soleman A, Riese J, Streichert T. Comparison of ChatGPT, Gemini, and Le Chat with physician interpretations of medical laboratory questions from an online health forum. Clinical Chemistry and Laboratory Medicine (CCLM) 2024;62(12):2425 View
Bhattaru A, Yanamala N, Sengupta P. Revolutionizing Cardiology With Words: Unveiling the Impact of Large Language Models in Medical Science Writing. Canadian Journal of Cardiology 2024;40(10):1950 View
Griewing S, Knitza J, Boekhoff J, Hillen C, Lechner F, Wagner U, Wallwiener M, Kuhn S. Evolution of publicly available large language models for complex decision-making in breast cancer care. Archives of Gynecology and Obstetrics 2024;310(1):537 View
Harada Y, Suzuki T, Harada T, Sakamoto T, Ishizuka K, Miyagami T, Kawamura R, Kunitomo K, Nagano H, Shimizu T, Watari T. Performance evaluation of ChatGPT in detecting diagnostic errors and their contributing factors: an analysis of 545 case reports of diagnostic errors. BMJ Open Quality 2024;13(2):e002654 View
Liu M, Okuhara T, Chang X, Shirabe R, Nishiie Y, Okada H, Kiuchi T. Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis. Journal of Medical Internet Research 2024;26:e60807 View
Suwała S, Szulc P, Guzowski C, Kamińska B, Dorobiała J, Wojciechowska K, Berska M, Kubicka O, Kosturkiewicz O, Kosztulska B, Rajewska A, Junik R. ChatGPT-3.5 passes Poland’s medical final examination—Is it possible for ChatGPT to become a doctor in Poland?. SAGE Open Medicine 2024;12 View
Ando K, Sato M, Wakatsuki S, Nagai R, Chino K, Kai H, Sasaki T, Kato R, Nguyen T, Guo N, Sultan P. A comparative study of English and Japanese ChatGPT responses to anaesthesia-related medical questions. BJA Open 2024;10:100296 View
Ming S, Guo Q, Cheng W, Lei B. Influence of Model Evolution and System Roles on ChatGPT’s Performance in Chinese Medical Licensing Exams: Comparative Study. JMIR Medical Education 2024;10:e52784 View
Lahat A, Sharif K, Zoabi N, Shneor Patt Y, Sharif Y, Fisher L, Shani U, Arow M, Levin R, Klang E. Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4. Journal of Medical Internet Research 2024;26:e54571 View
Fukuda H, Morishita M, Muraoka K, Yamaguchi S, Nakamura T, Yoshioka I, Awano S, Ono K. Evaluating the image recognition capabilities of GPT-4V and Gemini Pro in the Japanese national dental examination. Journal of Dental Sciences 2025;20(1):368 View
Lin C, Plooy K, Gray A, Brown D, Hobbs L, Patterson T, Tan V, Fridberg D, Hsu C. The Performance of ChatGPT on Short-answer Questions in a Psychiatry Examination: A Pilot Study. Taiwanese Journal of Psychiatry 2024;38(2):94 View
Zhao N, Wu T, Wang W, Zhang L, Gong X. Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure. Interdisciplinary Sciences: Computational Life Sciences 2024;16(2):261 View
Bortoli M, Fiore M, Tedeschi S, Oliveira V, Sousa R, Bruschi A, Campanacci D, Viale P, De Paolis M, Sambri A. GPT-based chatbot tools are still unreliable in the management of prosthetic joint infections. MUSCULOSKELETAL SURGERY 2024;108(4):459 View
Jo E, Song S, Kim J, Lim S, Kim J, Cha J, Kim Y, Joo H. Assessing GPT-4’s Performance in Delivering Medical Advice: Comparative Analysis With Human Experts. JMIR Medical Education 2024;10:e51282 View
Shang L, Li R, Xue M, Guo Q, Hou Y. Evaluating the application of ChatGPT in China’s residency training education: An exploratory study. Medical Teacher 2025;47(5):858 View
Terwilliger E, Bcharah G, Bcharah H, Bcharah E, Richardson C, Scheffler P. Advancing Medical Education: Performance of Generative Artificial Intelligence Models on Otolaryngology Board Preparation Questions With Image Analysis Insights. Cureus 2024 View
Sabri H, Saleh M, Hazrati P, Merchant K, Misch J, Kumar P, Wang H, Barootchi S. Performance of three artificial intelligence (AI)‐based large language models in standardized testing; implications for AI‐assisted dental education. Journal of Periodontal Research 2025;60(2):121 View
Cherif H, Moussa C, Missaoui A, Salouage I, Mokaddem S, Dhahri B. Appraisal of ChatGPT’s Aptitude for Medical Education: Comparative Analysis With Third-Year Medical Students in a Pulmonology Examination. JMIR Medical Education 2024;10:e52818 View
Hirosawa T, Harada Y, Mizuta K, Sakamoto T, Tokumasu K, Shimizu T. Diagnostic performance of generative artificial intelligences for a series of complex case reports. DIGITAL HEALTH 2024;10 View
Hale J, Alexander S, Wright S, Gilliland K. Generative AI in Undergraduate Medical Education: A Rapid Review. Journal of Medical Education and Curricular Development 2024;11 View
Nicikowski J, Szczepański M, Miedziaszczyk M, Kudliński B. The potential of ChatGPT in medicine: an example analysis of nephrology specialty exams in Poland. Clinical Kidney Journal 2024;17(8) View
Vij O, Calver H, Myall N, Dey M, Kouranloo K, Fernandes T. Evaluating the competency of ChatGPT in MRCP Part 1 and a systematic literature review of its capabilities in postgraduate medical assessments. PLOS ONE 2024;19(7):e0307372 View
Chow J, Cheng T, Chien T, Chou W. Assessing ChatGPT’s Capability for Multiple Choice Questions Using RaschOnline: Observational Study. JMIR Formative Research 2024;8:e46800 View
Ishida K, Hanada E. Potential of ChatGPT to Pass the Japanese Medical and Healthcare Professional National Licenses: A Literature Review. Cureus 2024 View
Moglia A, Georgiou K, Cerveri P, Mainardi L, Satava R, Cuschieri A. Large language models in healthcare: from a systematic review on medical examinations to a comparative analysis on fundamentals of robotic surgery online test. Artificial Intelligence Review 2024;57(9) View
Ishida K, Arisaka N, Fujii K. Analysis of Responses of GPT-4 V to the Japanese National Clinical Engineer Licensing Examination. Journal of Medical Systems 2024;48(1) View
Sallam M, Al-Salahat K, Eid H, Egger J, Puladi B. Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions. Advances in Medical Education and Practice 2024;Volume 15:857 View
Wang D, Zhang S. Large language models in medical and healthcare fields: applications, advances, and challenges. Artificial Intelligence Review 2024;57(11) View
Jin H, Lee H, Kim E. Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis. BMC Medical Education 2024;24(1) View
Yoon S, Oh S, Lim B, Lee H. Performance of ChatGPT in the In-Training Examination for Anesthesiology and Pain Medicine Residents in South Korea: Observational Study. JMIR Medical Education 2024;10:e56859 View
Sanders A, Lim R, Jones D, Vosburg R. Artificial intelligence large language model scores highly on focused practice designation in metabolic and bariatric surgery board practice questions. Surgical Endoscopy 2024;38(11):6678 View
Fujimoto M, Kuroda H, Katayama T, Yamaguchi A, Katagiri N, Kagawa K, Tsukimoto S, Nakano A, Imaizumi U, Sato-Boku A, Kishimoto N, Itamiya T, Kido K, Sanuki T. Evaluating Large Language Models in Dental Anesthesiology: A Comparative Analysis of ChatGPT-4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology Board Certification Exam. Cureus 2024 View
Brin D, Sorin V, Konen E, Nadkarni G, Glicksberg B, Klang E. How GPT models perform on the United States medical licensing examination: a systematic review. Discover Applied Sciences 2024;6(10) View
Shiraishi M, Tsuruda S, Tomioka Y, Chang J, Hori A, Ishii S, Fujinaka R, Ando T, Ohba J, Okazaki M. Advancement of Generative Pre-trained Transformer Chatbots in Answering Clinical Questions in the Practical Rhinoplasty Guideline. Aesthetic Plastic Surgery 2025;49(7):1874 View
Mankowski M, Jaffe I, Xu J, Bae S, Oermann E, Aphinyanaphongs Y, McAdams‐DeMarco M, Lonze B, Orandi B, Stewart D, Levan M, Massie A, Gentry S, Segev D. ChatGPT Solving Complex Kidney Transplant Cases: A Comparative Study With Human Respondents. Clinical Transplantation 2024;38(10) View
Hirata K, Matsui Y, Yamada A, Fujioka T, Yanagawa M, Nakaura T, Ito R, Ueda D, Fujita S, Tatsugami F, Fushimi Y, Tsuboyama T, Kamagata K, Nozaki T, Fujima N, Kawamura M, Naganawa S. Generative AI and large language models in nuclear medicine: current status and future prospects. Annals of Nuclear Medicine 2024;38(11):853 View
Seo J, Choi D, Kim T, Cha W, Kim M, Yoo H, Oh N, Yi Y, Lee K, Choi E. Evaluation Framework of Large Language Models in Medical Documentation: Development and Usability Study. Journal of Medical Internet Research 2024;26:e58329 View
Goto H, Shiraishi Y, Okada S. Performance of Generative Pre-trained Transformer (GPT)-4 and Gemini Advanced on the First-Class Radiation Protection Supervisor Examination in Japan. Cureus 2024 View
Huwiler J, Oechslin L, Biaggi P, Tanner F, Wyss C. Experimental assessment of the performance of artificial intelligence in solving multiple-choice board exams in cardiology. Swiss Medical Weekly 2024;154(10):3547 View
Ros-Arlanzón P, Perez-Sempere A. Evaluating AI Competence in Specialized Medicine: Comparative Analysis of ChatGPT and Neurologists in a Neurology Specialist Examination in Spain. JMIR Medical Education 2024;10:e56762 View
Wu J, Nishida T, Liu T. Accuracy of large language models in answering ophthalmology board-style questions: A meta-analysis. Asia-Pacific Journal of Ophthalmology 2024;13(5):100106 View
Li C, Zhang J, Abdul‐Masih J, Zhang S, Yang J. Performance of ChatGPT and Dental Students on Concepts of Periodontal Surgery. European Journal of Dental Education 2025;29(1):36 View
Liu M, Okuhara T, Chang X, Okada H, Kiuchi T, Khlaif Z. Performance of ChatGPT in medical licensing examinations in countries worldwide: A systematic review and meta-analysis protocol. PLOS ONE 2024;19(10):e0312771 View
Harigai A, Toyama Y, Nagano M, Abe M, Kawabata M, Li L, Yamamura J, Takase K. Response accuracy of GPT-4 across languages: insights from an expert-level diagnostic radiology examination in Japan. Japanese Journal of Radiology 2025;43(2):319 View
Zeng J, Zou X, Li S, Tang Y, Teng S, Li H, Wang C, Wu Y, Zhang L, Zhong Y, Liu J, Liu S. Assessing the Role of the Generative Pretrained Transformer (GPT) in Alzheimer’s Disease Management: Comparative Study of Neurologist- and Artificial Intelligence–Generated Responses. Journal of Medical Internet Research 2024;26:e51095 View
Tokgöz Kaplan T, Cankar M. Evidence‐Based Potential of Generative Artificial Intelligence Large Language Models on Dental Avulsion: ChatGPT Versus Gemini. Dental Traumatology 2025;41(2):178 View
Taniguchi M, Lindsey J. Performance of chatbots in queries concerning fundamental concepts in photochemistry. Photochemistry and Photobiology 2025;101(4):886 View
Uehara O, Morikawa T, Harada F, Sugiyama N, Matsuki Y, Hiraki D, Sakurai H, Kado T, Yoshida K, Murata Y, Matsuoka H, Nagasawa T, Furuichi Y, Abiko Y, Miura H. Performance of ChatGPT‐3.5 and ChatGPT‐4o in the Japanese National Dental Examination. Journal of Dental Education 2025;89(4):459 View
Aster A, Laupichler M, Rockwell-Kollmann T, Masala G, Bala E, Raupach T. ChatGPT and Other Large Language Models in Medical Education — Scoping Literature Review. Medical Science Educator 2024;35(1):555 View
Ho C, Tian T, Ayers A, Aaron R, Phillips V, Wolf R, Mathioudakis N, Dai T, Klonoff D. Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review. BMC Medical Informatics and Decision Making 2024;24(1) View
Chen Y, Huang X, Yang F, Lin H, Lin H, Zheng Z, Liang Q, Zhang J, Li X. Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study. BMC Medical Education 2024;24(1) View
Mavrych V, Ganguly P, Bolgova O. Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis. Clinical Anatomy 2025;38(2):200 View
Miyazaki Y, Hata M, Omori H, Hirashima A, Nakagawa Y, Eto M, Takahashi S, Ikeda M. Performance of ChatGPT-4o on the Japanese Medical Licensing Examination: Evalution of Accuracy in Text-Only and Image-Based Questions. JMIR Medical Education 2024;10:e63129 View
Lee J, Park S, Shin J, Cho B. Analyzing evaluation methods for large language models in the medical field: a scoping review. BMC Medical Informatics and Decision Making 2024;24(1) View
石田開, 有阪直, 藤井清. Analysis of Japanese Clinical Engineer License Examinations using ChatGPT (GPT－4V). Iryou kikigaku (The Japanese journal of medical instrumentation) 2024;94(5):514 View
Li Y, Huang C, Hu Y, Zhou X, He C, Zhong J. Exploring the performance of large language models on hepatitis B infection-related questions: A comparative study. World Journal of Gastroenterology 2025;31(3) View
Kusaka S, Akitomo T, Hamada M, Asao Y, Iwamoto Y, Tachikake M, Mitsuhata C, Nomura R. Usefulness of Generative Artificial Intelligence (AI) Tools in Pediatric Dentistry. Diagnostics 2024;14(24):2818 View
Ramnani S, Bhalla M, Bassi R. A Comparative Study of ChatGPT and BingAI in Answering the National Eligibility Entrance Test for Postgraduates (NEET-PG)-Style Practice Questions: A Cross-Sectional Analysis. Cureus 2024 View
Fukushima T, Manabe M, Yada S, Wakamiya S, Yoshida A, Urakawa Y, Maeda A, Kan S, Takahashi M, Aramaki E. Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset. JMIR Medical Informatics 2025;13:e65047 View
Pavone M, Palmieri L, Bizzarri N, Rosati A, Campolo F, Innocenzi C, Taliento C, Restaino S, Catena U, Vizzielli G, Akladios C, Ianieri M, Marescaux J, Campo R, Fanfani F, Scambia G. Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests. Facts, Views and Vision in ObGyn 2024;16(4):449 View
Roos J, Martin R, Kaczmarczyk R. Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study. JMIR Formative Research 2024;8:e57592 View
Abdelgadir Y, Thongprayoon C, Craici I, Cheungpasitporn W, Miao J. Enhancing Patient Comprehension of Glomerular Disease Treatments Using ChatGPT. Healthcare 2024;13(1):57 View
Ishida K, Hanada E. ChatGPT (GPT-4V) Performance on the Healthcare Information Technologist Examination in Japan. Cureus 2025 View
Johri S, Jeong J, Tran B, Schlessinger D, Wongvibulsin S, Barnes L, Zhou H, Cai Z, Van Allen E, Kim D, Daneshjou R, Rajpurkar P. An evaluation framework for clinical use of large language models in patient interaction tasks. Nature Medicine 2025;31(1):77 View
Sun D, Xu P, Zhang J, Liu R, Zhang J. How Self-Regulated Learning Is Affected by Feedback Based on Large Language Models: Data-Driven Sustainable Development in Computer Programming Learning. Electronics 2025;14(1):194 View
Abdulrab S, Abada H, Mashyakhy M, Mostafa N, Alhadainy H, Halboub E. Performance of 4 Artificial Intelligence Chatbots in Answering Endodontic Questions. Journal of Endodontics 2025;51(5):602 View
Kondo T, Okamoto M, Kondo Y. Pilot Study on Using Large Language Models for Educational Resource Development in Japanese Radiological Technologist Exams. Medical Science Educator 2025;35(2):919 View
Kim J, Vajravelu B. Assessing the Current Limitations of Large Language Models in Advancing Health Care Education. JMIR Formative Research 2025;9:e51319 View
Öztürk Z, Bal C, Çelikkaya B. Evaluation of Information Provided by ChatGPT Versions on Traumatic Dental Injuries for Dental Students and Professionals. Dental Traumatology 2025;41(4):427 View
Wang Y, Shen H, Chen T, Chiang S, Lin T. Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study. JMIR Medical Education 2025;11:e56850 View
Yoşumaz İ. Generative Artificial Intelligence and Usage in Academia. Fırat Üniversitesi Sosyal Bilimler Dergisi 2025;35(1):1 View
Nomura A, Takeji Y, Shimojima M, Takamura M. Digitalomics: Towards Artificial Intelligence / Machine Learning-Based Precision Cardiovascular Medicine. Circulation Journal 2026;90(5):458 View
Franke S, Pott C, Rutinowski J, Pauly M, Reining C, Kirchheim A. Can ChatGPT Solve Undergraduate Exams from Warehousing Studies? An Investigation. Computers 2025;14(2):52 View
Salman I, Ameer O, Khanfar M, Hsieh Y. Artificial intelligence in healthcare education: evaluating the accuracy of ChatGPT, Copilot, and Google Gemini in cardiovascular pharmacology. Frontiers in Medicine 2025;12 View
Borsetto D, Sia E, Axon P, Donnelly N, Tysome J, Anschuetz L, Bernardeschi D, Capriotti V, Caye-Thomasen P, West N, Erbele I, Franchella S, Gatto A, Hess-Erga J, Kunst H, Marinelli J, Mannion R, Panizza B, Trabalzini F, Obholzer R, Vaira L, Polesel J, Giudici F, Carlson M, Tirelli G, Boscolo-Rizzo P. Quality of Information Provided by Artificial Intelligence Chatbots Surrounding the Management of Vestibular Schwannomas: A Comparative Analysis Between ChatGPT-4 and Claude 2. Otology & Neurotology 2025;46(4):432 View
Liu Q, Hu A, Gladman T, Gallagher S. Eight Months into Reality: A Scoping Review of the Application of ChatGPT in Higher Education Teaching and Learning. Innovative Higher Education 2025;50(5):1677 View
Işik G, Kafadar-Gürbüz İ, Elgün F, Kara R, Berber B, Özgül S, Günbay T. Is Artificial Intelligence a Useful Tool for Clinical Practice of Oral and Maxillofacial Surgery?. Journal of Craniofacial Surgery 2025;36(2):558 View
Topcu T, Husain M, Ofsa M, Wach P. Trust at Your Own Peril: A Mixed Methods Exploration of the Ability of Large Language Models to Generate Expert‐Like Systems Engineering Artifacts and a Characterization of Failure Modes. Systems Engineering 2025;28(5):583 View
Hallquist E, Gupta I, Montalbano M, Loukas M. Applications of Artificial Intelligence in Medical Education: A Systematic Review. Cureus 2025 View
Chaddad A, Jiang Y. Integrating Technologies in the Metaverse for Enhanced Healthcare and Medical Education. IEEE Transactions on Learning Technologies 2025;18:216 View
Peng L, Wu Y, Sun J, Xing Y, Li M, Li M. Will Artificial Intelligence Nurse Practitioners Become True? Performance Evaluation of ChatGPT in the American Academy of Nurse Practitioners Certification Board Exam. AI, Computer Science and Robotics Technology 2025;4 View
Oprea S, Bâra A. Interpreting text corpora from androids-related stories using large language models: “Machines like me” by Ian McEwan in generative AI. Humanities and Social Sciences Communications 2025;12(1) View
Sav N. Assessing the performance of large language models (GPT-3.5 and GPT-4) and accurate clinical information for pediatric nephrology. Pediatric Nephrology 2025;40(9):2879 View
Gan W, Ouyang J, She G, Xue Z, Zhu L, Lin A, Mou W, Jiang A, Qi C, Cheng Q, Luo P, Li H, Zheng X. ChatGPT’s role in alleviating anxiety in total knee arthroplasty consent process: a randomized controlled trial pilot study. International Journal of Surgery 2025;111(3):2546 View
Ishida K. Evaluating Chat Generative Pretrained Transformer (GPT-4o) Problem-Solving Performance in the Japan Certificate Examination for Biomedical Engineering Class 1. Cureus 2025 View
Mavrych V, Yaqinuddin A, Bolgova O. Claude, ChatGPT, Copilot, and Gemini performance versus students in different topics of neuroscience. Advances in Physiology Education 2025;49(2):430 View
Rodrigues Alessi M, Gomes H, Oliveira G, Lopes de Castro M, Grenteski F, Miyashiro L, do Valle C, Tozzini Tavares da Silva L, Okamoto C. Comparative Performance of Medical Students, ChatGPT-3.5 and ChatGPT-4.0 in Answering Questions From a Brazilian National Medical Exam: Cross-Sectional Questionnaire Study. JMIR AI 2025;4:e66552 View
Yang B, Zhou F, Bai N, Zhou S, Luo C, Wang Q, Wong A, Lin F. Digital and Intelligence Education in Medicine: A Bibliometric and Visualization Analysis Using CiteSpace and VOSviewer. Frontiers of Digital Education 2025;2(1) View
Matsutomo N, Fukami M, Yamamoto T. Can interactive artificial intelligence be used for patient explanations of nuclear medicine examinations in Japanese?. Annals of Nuclear Medicine 2025;39(8):774 View
Yu Y, Kim S, Lee W, Koo B. Evaluating ChatGPT on Korea's BIM Expertise Exam and improving its performance through RAG. Journal of Computational Design and Engineering 2025;12(4):94 View
Luo D, Liu M, Yu R, Liu Y, Jiang W, Fan Q, Kuang N, Gao Q, Yin T, Zheng Z. Evaluating the performance of GPT-3.5, GPT-4, and GPT-4o in the Chinese National Medical Licensing Examination. Scientific Reports 2025;15(1) View
Koçak M, Oğuz A, Akçalı Z. The role of artificial intelligence in medical education: an evaluation of Large Language Models (LLMs) on the Turkish Medical Specialty Training Entrance Exam. BMC Medical Education 2025;25(1) View
Buhl L. The answer may vary: large language model response patterns challenge their use in test item analysis. Medical Teacher 2025;47(11):1761 View
Yitzhaki S, Peled N, Kaplan E, Kadmon G, Nahum E, Gendler Y, Weissbach A. Comparing ChatGPT‐4 and a Paediatric Intensive Care Specialist in Responding to Medical Education Questions: A Multicenter Evaluation. Journal of Paediatrics and Child Health 2025;61(7):1084 View
Chen R, Zhang S, Zheng Y, Yu Q, Wang C. Enhancing treatment decision-making for low back pain: a novel framework integrating large language models with retrieval-augmented generation technology. Frontiers in Medicine 2025;12 View
Wang C, Wang F, Li S, Ren Q, Tan X, Fu Y, Liu D, Qian G, Cao Y, Yin R, Li K. Patient Triage and Guidance in Emergency Departments Using Large Language Models: Multimetric Study. Journal of Medical Internet Research 2025;27:e71613 View
Weuthen F, Otte N, Krabbe H, Kraus T, Krabbe J. Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial. JMIR Formative Research 2025;9:e63857 View
Wang L, Li J, Zhuang B, Huang S, Fang M, Wang C, Li W, Zhang M, Gong S. Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis. Journal of Medical Internet Research 2025;27:e64486 View
Chen Y, Lee S, Sheu H, Lin S, Hu C, Fu S, Yang C, Lin Y. Enhancing responses from large language models with role-playing prompts: a comparative study on answering frequently asked questions about total knee arthroplasty. BMC Medical Informatics and Decision Making 2025;25(1) View
Wang W, Fu J, Zhang Y, Hu K. A Comparative Analysis of GPT-4o and ERNIE Bot in a Chinese Radiation Oncology Exam. Journal of Cancer Education 2026;41(2):256 View
Lee J, Park S, Hwang S, Lee J, Cho D, Choi S. Comparative evaluation of six large language models in transfusion medicine: Addressing language and domain‐specific challenges. Vox Sanguinis 2026;121(4):496 View
Fukushima M, Eshita S, Fukuhara H. Advancements and limitations of LLMs in replicating human color-word associations. Discover Artificial Intelligence 2025;5(1) View
Huang S, Wen C, Bai X, Li S, Wang S, Wang X, Yang D. Exploring the Application Capability of ChatGPT as an Instructor in Skills Education for Dental Medical Students: Randomized Controlled Trial. Journal of Medical Internet Research 2025;27:e68538 View
Wu J, Wang Z, Qin Y. Performance of DeepSeek-R1 and ChatGPT-4o on the Chinese National Medical Licensing Examination: A Comparative Study. Journal of Medical Systems 2025;49(1) View
Sheridan G, Howard L, Neufeld M, Doyle T, Hughes A, Sculco P, Beverland D, Garbuz D, Masri B. Can artificial intelligence generate scientific discussion that passes peer review for publication in a high-impact orthopaedic journal?. Irish Journal of Medical Science (1971 -) 2025;194(4):1191 View
Vurture G, Jenkins N, Ross J, Sansone S, Conner E, Jacobson N, Smilen S, Baum J. Addressing Commonly Asked Questions in Urogynecology: Accuracy and Limitations of ChatGPT. International Urogynecology Journal 2025;36(11):2249 View
Koca M. Comparative performance analysis of artificial intelligence models in therapeutic apheresis training: A pilot study. Transfusion and Apheresis Science 2025;64(4):104188 View
Gimpel H, Laubacher R, Probost F, Schäfer R, Schoch M. Idea Evaluation for Solutions to Specialized Problems: Leveraging the Potential of Crowds and Large Language Models. Group Decision and Negotiation 2025;34(4):903 View
Bernstein E, Ramsamooj A, Millar K, Lum Z. Identification and Categorization of the Top 100 Articles and the Future of Large Language Models: Thematic Analysis Using Bibliometric Analysis. JMIR AI 2025;4:e68603 View
Picard C, Edwards K, Doris A, Man B, Giannone G, Alam M, Ahmed F. From concept to manufacturing: evaluating vision-language models for engineering design. Artificial Intelligence Review 2025;58(9) View
Zeng A, Steinke J, Bocse H, De Pastena M. Dr. LLM Will See You Now: The Ability of ChatGPT to Provide Geographically Tailored Colorectal Cancer Screening and Surveillance Recommendations. Journal of Clinical Medicine 2025;14(14):5101 View
Tong X, Hu Y, Long Y, Zhang Q, Yang Y, Yuan J, Zha Y. The application of problem-based learning (PBL) guided by ChatGPT in clinical education in the Department of Nephrology. BMC Medical Education 2025;25(1) View
Ahmed H. Evaluating the Performance of Large Language Models on Multispecialty FRCS Section 1 Questions. Journal of Surgical Research 2025;313:66 View
Huang C, Lee Y, Sun A, Chiang C. Performance of ChatGPT-4, Gemini, and DeepSeek-V3 on answering the multiple choice questions from Taiwan national dental technician licensing examinations and their self-learning abilities over a three-week period. Journal of Dental Sciences 2025;20(4):2154 View
Sato H, Ogasawara K, Sakurai H. Performance Evaluation of 18 Generative AI Models (ChatGPT, Gemini, Claude, and Perplexity) in 2024 Japanese Pharmacist Licensing Examination: Comparative Study. JMIR Medical Education 2025;11:e76925 View
Verdú G, Rayo A, Fabregat-Bolufer A. Can AI Outperform Human Aspirants? Evaluating 3 ChatGPT Models on the Spanish FIR and BIR Specialized Health Examinations. The Journal of Applied Laboratory Medicine 2025;10(5):1215 View
Song E, Kim G, Lee S. Evaluation of GPT-4o and Gemini Advanced on the Korean National Dental Licensing Examination: Accuracy, consistency, and question generation. Journal of Dental Sciences 2026;21(1):96 View
Niu Z, Kuang X, Chen J, Cai X, Zhang P. The application and challenges of ChatGPT in laboratory medicine. Advances in Laboratory Medicine / Avances en Medicina de Laboratorio 2025;6(4):385 View
Lee J, Choi S, Park S, Hwang S, Cho D. Evaluation of Six Large Language Models for Clinical Decision Support: Application in Transfusion Decision-making for RhD Blood-type Patients. Annals of Laboratory Medicine 2025;45(5):520 View
Nishisako S, Higashi T, Wakao F. Reducing Hallucinations and Trade-Offs in Responses in Generative AI Chatbots for Cancer Information: Development and Evaluation Study. JMIR Cancer 2025;11:e70176 View
Mutlu H, Kokulu K, Sert E, Topuz M. Evaluation of ChatGPTs Performance in Türkiye’s First Emergency Medicine Sub-Specialization Exam. Eurasian Journal of Emergency Medicine 2025 View
Lin Y, Luo Z, Ye Z, Zhong N, Zhao L, Zhang L, Li X, Chen Z, Chen Y. Applications, Challenges, and Prospects of Generative Artificial Intelligence Empowering Medical Education: Scoping Review. JMIR Medical Education 2025;11:e71125 View
Ekici Ö. Comparative Evaluation of Four Large Language Models in Turkish Dentistry Specialization Exam. Selcuk Dental Journal 2025;12(4):6 View
Jaleel A, Aziz U, Farid G, Zahid Bashir M, Mirza T, Khizar Abbas S, Aslam S, Sikander R. Evaluating the Potential and Accuracy of ChatGPT-3.5 and 4.0 in Medical Licensing and In-Training Examinations: Systematic Review and Meta-Analysis. JMIR Medical Education 2025;11:e68070 View
Lin M, Xu C, Qu X, Xu B, Wang Y, Zou R, Zhang Y. Evaluation of reliability, repeatability, and confidence of ChatGPT for screening, monitoring, and treatment of interstitial lung disease in patients with systemic autoimmune rheumatic diseases. DIGITAL HEALTH 2025;11 View
Yoon D, Kim C, Ryu Y, Lee Y, Chae Y. Performance of GPT-4 for planning acupuncture treatment: comparison with human clinician performance. Frontiers in Medicine 2025;12 View
La N, Rattanapitoon S, Aeksanti T, Rattanapitoon N. Expanding the Role of Generative AI in Paediatric Intensive Care Education: Beyond Factual Knowledge Toward Integrated Clinical Reasoning. Journal of Paediatrics and Child Health 2026;62(1):148 View
Shaikh Y, Jeelani-Shaikh Z, Jeelani M, Javaid A, Mahmud T, Gaglani S, Gibbons M, Cheema M, Cross A, Livingston D, Cheatham M, Nezami E, Dixon R, Niranjan-Azadi A, Zafar S, Siddiqui Z, Villanueva C. Collaborative intelligence in AI: Evaluating the performance of a council of AIs on the USMLE. PLOS Digital Health 2025;4(10):e0000787 View
Kasagga A, Sapkota A, Changaramkumarath G, Abucha J, Wollel M, Somannagari N, Husami M, Hailu K, Kasagga E. Performance of ChatGPT and Large Language Models on Medical Licensing Exams Worldwide: A Systematic Review and Network Meta-Analysis With Meta-Regression. Cureus 2025 View
Sozen Yanik I, Sahin Hazir D, Bilgin Avsar D. Cross-lingual performance of large language models in maxillofacial prosthodontics: a comparative evaluation. BMC Oral Health 2025;25(1) View
Kendir M, Zuhurlu M. Evaluation Large Language Models’ Time Dependent Consistency in Aesthetic Surgery Consultations and Comparison of Their Performance Across Different Clinical Domains. Aesthetic Plastic Surgery 2026;50(9):3485 View
Chimirri L, Caufield J, Bridges Y, Matentzoglu N, Gargano M, Cazalla M, Chen S, Danis D, Dingemans A, Gehle K, Gehle P, Graefe A, Gu W, Ladewig M, Lapunzina P, Nevado J, Niyonkuru E, Ogishima S, Seelow D, Tenorio Castaño J, Turnovec M, de Vries B, Wang K, Wissink K, Yüksel Z, Zucca G, Haendel M, Mungall C, Reese J, Robinson P. Consistent performance of large language models in rare disease diagnosis across ten languages and 4917 cases. eBioMedicine 2025;121:105957 View
Borrone R. ChatGPT-4. Oftalmología Clínica y Experimental 2024;17(01):e41 View
Liu M, Okuhara T, Shirabe R, Nishiie Y, Xu Y, Okada H, Kiuchi T. Evaluating the Reliability and Accuracy of an AI-Powered Search Engine in Providing Responses on Dietary Supplements: Quantitative and Qualitative Evaluation. JMIR AI 2025;4:e78436 View
Lee J. ChatGPT: how to use it and the pitfalls/cautions in academia. Annals of Pediatric Endocrinology & Metabolism 2025;30(5):229 View
Chen Y, Wen B, Zulkernine F. A Multiagent Summarization and Auto-Evaluation Framework for Medical Text: Development and Evaluation Study. JMIR AI 2025;4:e75932 View
Maison D, Silva A, Glir B, Moré A. Estudantes de medicina versus chatbots na resolução de um teste médico: um estudo comparativo. Revista Brasileira de Educação Médica 2025;49(4) View
Maison D, Silva A, Glir B, Moré A. Medical students versus chatbots in solving a medical test: a comparative study. Revista Brasileira de Educação Médica 2025;49(4) View
Noda R, Yuasa C, Kitano F, Ichikawa D, Shibagaki Y. Performance of o1 pro and GPT-4 in Self-Assessment Questions for Nephrology Board Renewal. Frontiers in Medicine 2025;12 View
Turan Gökduman C, Arılı Öztürk E, Aktaş Ş, Çanakçi̇ B. Comparison of chatbots’ accuracy in endodontics questions in dentistry specialization exam in Türkiye: ChatGPT-4o, Gemini Advanced, Copilot, and Claude. BMC Oral Health 2025;26(1) View
Aydin Varol E, Ozturk Z, Bal C, Karamuftuoglu N. Comparison of two different artificial intelligence chatbots that provide information to patients and parents about primary tooth pulpotomy treatments. BMC Oral Health 2025;26(1) View
Niu Z, Kuang X, Chen J, Cai X, Zhang P. Aplicaciones y retos de ChatGPT en la medicina de laboratorio. Advances in Laboratory Medicine / Avances en Medicina de Laboratorio 2025;6(4):397 View
Kültüroğlu G, Özgüner Y, Altınsoy S, Kına S, Erdem Hıdıroğlu E, Ergil J. Can Artificial Intelligence Be Successful as an Anaesthesiology and Reanimation Resident?. Turkish Journal of Anaesthesiology and Reanimation 2025 View
Tocev T, Atanasovski A. AI Chatbot as IFRS Advisory Tool: GPT‐4 Experimental Design. Intelligent Systems in Accounting, Finance and Management 2026;33(1) View
Wang B, Zhang M, Wang Z, Yao K, Hao M, Wang J, Peng S, Zhu Y. Supporting postgraduate exam preparation with large language models: implications for traditional Chinese medicine education. Frontiers in Medicine 2026;12 View
Miyamura M, Fujiki G, Kanzaki Y, Tsuda K, Asano H, Morita H, Hoshiga M. Evaluating Chat GPT-4o’s Comparative Performance over GPT-4 in Japanese Medical Licensing Examination and Its Clinical Partnership Potential. International Medical Education 2026;5(1):9 View
Benito P, Isla-Jover M, González-Castro P, Fernández Esparcia P, Carpio M, Blay-Simón I, Gutiérrez-Bedia P, Lapastora M, Carratalá B, Carazo-Casas C. GPT-4o and OpenAI o1 Performance on the 2024 Spanish Competitive Medical Specialty Access Examination: Cross-Sectional Quantitative Evaluation Study. JMIR Medical Education 2026;12:e75452 View
Lian L, Luo X, Chipusu K, Ashraf M, Wong K, Zhang W. Large Language Models Evaluation of Medical Licensing Examination Using GPT-4.0, ERNIE Bot 4.0, and GPT-4o. Bioengineering 2026;13(1):113 View
Sağlam H, Sezgin G, Kaplan T, Kaplan S. Artificial intelligence chatbots versus dentists: a comparative knowledge assessment on traumatic dental injury management. BMC Oral Health 2026;26(1) View
Gürses Ö, Ceylan İ. Consistency over accuracy: run-to-run stability of contemporary large language models on Turkish curriculum-aligned theoretical anatomy multiple-choice questions. BMC Medical Education 2026;26(1) View
Duwe G, Moench K, Kauth V, Angeloni M, Eckhoff J, Görtz M, Hoefert S, Kocar T, Kollitsch L, Mehralivand S, Mercier D, Rudolph J, Rueckel J, Schönhof R, Sondermann M, von Klot C, Zamzow A, Struck J, Borgmann H. Künstliche Intelligenz in chirurgischen Disziplinen: Einsatz, Nutzen und Potenzial – ein Delphi-Expertenkonsensus. Die Urologie 2026;65(6):627 View
Foster A, Price N, Brown V, Reed S. Artificial Intelligence in Health Professional Licensing: Performance of ChatGPT-3.5 and GPT-4; Systematic Review and Meta-Analysis. Annals of Pharmacy Education, Safety, and Public Health Advocacy 2022;2(1):176 View
Zhang P, Wang J, Hu X, Wang X, Fan X, Chi W, Yang W. Comparative performance of GPT-4, GPT-o3, GPT-5, Gemini-3-Flash, and DeepSeek-R1 in ophthalmology question answering. Frontiers in Cell and Developmental Biology 2026;14 View
Kaplan T. A comparative evaluation of two large language models in pediatric dentistry. BMC Oral Health 2026;26(1) View
Liu M, Okuhara T, Dai Z, Zhao M, Yin W, Okada H, Furukawa E, Kiuchi T. Textbook-level medical knowledge in large language models: comparative evaluation using Japanese National Medical Examination. BMC Medical Informatics and Decision Making 2026;26(1) View
Haylaz E, Gumussoy I, Kalabalik F, Say Ş, Can Eren M, Geduk G. Evaluation of the performance of four different large language models (ChatGPT, DeepSeek, Copilot, and Gemini) in answering oral, and maxillofacial radiology questions: pilot study. BMC Oral Health 2026;26(1) View
Agarwal P, Agarwal R, Iezhitsa I. AI for Assessment in Medical Education in Post LLM Era: A Scoping Review. Medical Science Educator 2026;36(2):1027 View
Takahashi Y, Kumakura R, Okamoto R, Omote S. Performance of Large Language Models in the Japanese Public Health Nurse National Examination: Comparative Cross-Sectional Study. JMIR Nursing 2026;9:e82842 View
Du C, Pan Y, Ng C, Ding Y, Pan J, Xue W, Yao X, Huang J. Three large language models demonstrate competitive performance in Traditional Chinese Medicine national medical licensing examinations over two years. Scientific Reports 2026;16(1) View
Güler I, Muir L, Grieb G, Moog P, Kraus A, Stelling H. Performance and reliability of large language models on the European Board of Hand Surgery examination: a multi-model evaluation study. Journal of Hand Surgery (European Volume) 2026 View
Su C, Dai K, Chen Y, Kao Y. Using ChatGPT to Solve Clinical Radiobiology Problems. Journal of Cancer Education 2026 View
Chen H, Watanabe S, Orii R, Kaneko M, Yumoto K, Kashizaki F. Performance of Recent Large Language Models on the Japanese National Medical Licensing Examination: A Multimodal Accuracy and Response-Time Comparison. AI and Clinical Practice 2026;1(2):e102 View

Books/Policy Documents

Huang D, Wang Z. Trends and Applications in Knowledge Discovery and Data Mining. View
Semujanga B, Mikalef P. Disruptive Innovation in a Digitally Connected Healthy World. View
Marques de Sá A, Aoussat A, Maranzana N. Product Lifecycle Management. Integrating Digital Technologies for Sustainability and Innovation. View
Liu J, Yan L, Wang T, Niu Q, Nagai-Tanima M, Aoyama T. AI for Clinical Applications. View
Llerena Izquierdo J, Ayala Carabajo R. Bioética. Inteligencia artificial, cambio climático y muerte asistida. View

Conference Proceedings

Xu B, Wang R, Ping L, Zhu C, Liu X, Lin H, Tian L, Xia F. 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). MAT: Medical AI-generated Text Detection Dataset from Multi-models and Multi-Methods View
Hidayaturrahman , Prawira I. 2024 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS). Leveraging Zero-Shot Learning in Large Language Models for Sentiment Analysis: A Comparative Study on the Indonesian Language View
Daher W, Diab H, Rayan A. 2025 International Conference on Smart Learning Courses (SCME). Potential of Generative AI for Chemistry Problem Solving: Evaluating Effectiveness Across Different AI Model Generations and Linguistic Contexts View
Pattanshetti R, Sidddanagoudra S, Chand S, S P, Hebbar R, Vaishnavi . 2025 International Conference on Biomedical Engineering and Sustainable Healthcare (ICBMESH). Assessing the Performance of Large Language Models on the Foreign Medical Graduate Examination (FMGE): Insights from GPT-4 Turbo, Gemini Advanced, and LLaMA 3.1 (70B) View

This paper is in the following e-collection/theme issue:

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

Journals

Books/Policy Documents

Conference Proceedings