Published on in Vol 10 (2024)
Preprints (earlier versions) of this paper are
available at
https://preprints.jmir.org/preprint/50965, first published
.

Journals
- Miao J, Thongprayoon C, Suppadungsuk S, Garcia Valencia O, Cheungpasitporn W. Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications. Medicina 2024;60(3):445 View
- Lucas F, Mackie I, d'Onofrio G, Frater J. Responsible use of chatbots to advance the laboratory hematology scientific literature: Challenges and opportunities. International Journal of Laboratory Hematology 2024;46(S1):9 View
- Zhu L, Mou W, Hong C, Yang T, Lai Y, Qi C, Lin A, Zhang J, Luo P. The Evaluation of Generative AI Should Include Repetition to Assess Stability. JMIR mHealth and uHealth 2024;12:e57978 View
- Meyer A, Ruthard J, Streichert T. Dear ChatGPT – can you teach me how to program an app for laboratory medicine?. Journal of Laboratory Medicine 2024;48(5):197 View
- Kaneda Y, Tayuinosho A, Tomoyose R, Takita M, Hamaki T, Tanimoto T, Ozaki A. Evaluating ChatGPT's effectiveness and tendencies in Japanese internal medicine. Journal of Evaluation in Clinical Practice 2024;30(6):1017 View
- Lee T, Rao A, Campbell D, Radfar N, Dayal M, Khrais A. Evaluating ChatGPT-3.5 and ChatGPT-4.0 Responses on Hyperlipidemia for Patient Education. Cureus 2024 View
- Meyer A, Soleman A, Riese J, Streichert T. Comparison of ChatGPT, Gemini, and Le Chat with physician interpretations of medical laboratory questions from an online health forum. Clinical Chemistry and Laboratory Medicine (CCLM) 2024;62(12):2425 View
- Liu M, Okuhara T, Chang X, Shirabe R, Nishiie Y, Okada H, Kiuchi T. Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis. Journal of Medical Internet Research 2024;26:e60807 View
- Suwała S, Szulc P, Guzowski C, Kamińska B, Dorobiała J, Wojciechowska K, Berska M, Kubicka O, Kosturkiewicz O, Kosztulska B, Rajewska A, Junik R. ChatGPT-3.5 passes Poland’s medical final examination—Is it possible for ChatGPT to become a doctor in Poland?. SAGE Open Medicine 2024;12 View
- Nicikowski J, Szczepański M, Miedziaszczyk M, Kudliński B. The potential of ChatGPT in medicine: an example analysis of nephrology specialty exams in Poland. Clinical Kidney Journal 2024;17(8) View
- Brandtzaeg P, Skjuve M, Følstad A. Understanding model power in social AI. AI & SOCIETY 2025;40(4):2839 View
- Kipp M. From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance. Information 2024;15(9):543 View
- Brin D, Sorin V, Konen E, Nadkarni G, Glicksberg B, Klang E. How GPT models perform on the United States medical licensing examination: a systematic review. Discover Applied Sciences 2024;6(10) View
- Fan K, Fan K. Dermatological Knowledge and Image Analysis Performance of Large Language Models Based on Specialty Certificate Examination in Dermatology. Dermato 2024;4(4):124 View
- Pillai J, Pillai K. ChatGPT as a medical education resource in cardiology: Mitigating replicability challenges and optimizing model performance. Current Problems in Cardiology 2024;49(12):102879 View
- Omar M, Nadkarni G, Klang E, Glicksberg B, Silva J. Large language models in medicine: A review of current clinical trials across healthcare applications. PLOS Digital Health 2024;3(11):e0000662 View
- Maraqa N, Samargandi R, Poichotte A, Berhouet J, Benhenneda R. Comparing performances of french orthopaedic surgery residents with the artificial intelligence ChatGPT-4/4o in the French diploma exams of orthopaedic and trauma surgery. Orthopaedics & Traumatology: Surgery & Research 2025;111(8):104080 View
- Rhyu K. The Surge of Artificial Intelligence (AI) in Scientific Writing: Who Will Hold the Rudder, You or AI?. Hip & Pelvis 2024;36(4):231 View
- Syed S, Ahmed R, Iqbal A, Ahmad N, Alshara M. MediScan: A Framework of U-Health and Prognostic AI Assessment on Medical Imaging. Journal of Imaging 2024;10(12):322 View
- Lukac S, Griewing S, Leinert E, Dayan D, Heitmeir B, Wallwiener M, Janni W, Fink V, Ebner F. ChatGPT, Google, or PINK? Who Provides the Most Reliable Information on Side Effects of Systemic Therapy for Early Breast Cancer?. Clinics and Practice 2024;15(1):8 View
- Maraqa N, Samargandi R, Poichotte A, Berhouet J, Benhenneda R. Comparaison des performances des internes français de chirurgie orthopédique et de l’intelligence artificielle ChatGPT-4/4o aux examens du diplôme d’études spécialisées de chirurgie orthopédique et traumatologique. Revue de Chirurgie Orthopédique et Traumatologique 2025 View
- Qiu Y, Liu C. Capable exam-taker and question-generator: the dual role of generative AI in medical education assessment. Global Medical Education 2025 View
- Kim J, Vajravelu B. Assessing the Current Limitations of Large Language Models in Advancing Health Care Education. JMIR Formative Research 2025;9:e51319 View
- Mustuloğlu Ş, Deniz B. Evaluation of Chatbots in the Emergency Management of Avulsion Injuries. Dental Traumatology 2025;41(4):437 View
- Bany Abdelnabi A, Soykan B, Bhatti D, Rabadi G. Usefulness of Large Language Models (LLMs) for Student Feedback on H&P During Clerkship: Artificial Intelligence for Personalized Learning. ACM Transactions on Computing for Healthcare 2025 View
- Gehring D, Titus S, George R. The Perceived Concerns of Nurse Educators' Use of GenAI in Nursing Education: Protocol for a Scoping Review. Health Science Reports 2025;8(2) View
- Barr A, Quan J, Guo E, Sezgin E. Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data. Frontiers in Artificial Intelligence 2025;8 View
- Meyer A, Wetsch W, Steinbicker A, Streichert T. Through ChatGPT’s Eyes: The Large Language Model’s Stereotypes and what They Reveal About Healthcare. Journal of Medical Systems 2025;49(1) View
- Altalla’ B, Ahmad A, Bitar L, Al-Bssol M, Al Omari A, Sultan I, Sarkar S. Radiology Report Annotation Using Generative Large Language Models: Comparative Analysis. International Journal of Biomedical Imaging 2025;2025(1) View
- Fajt B, Schiller E. ChatGPT in Academia: University Students’ Attitudes Towards the use of ChatGPT and Plagiarism. Journal of Academic Ethics 2025;23(3):1363 View
- Hallquist E, Gupta I, Montalbano M, Loukas M. Applications of Artificial Intelligence in Medical Education: A Systematic Review. Cureus 2025 View
- Dobbins N. Generalizable and scalable multistage biomedical concept normalization leveraging large language models. Research Synthesis Methods 2025;16(3):479 View
- Jongkind R, Elings E, Joukes E, Broens T, Leopold H, Wiesman F, Meinema J. Is your curriculum GenAI-proof? A method for GenAI impact assessment and a case study. MedEdPublish 2025;15:11 View
- Kaster L, Hillis E, Oh I, Aravamuthan B, Lanzotti V, Vickstrom C, Wasserstein M, Chopra M, Sahin M, Wangler M, Schultz B, Izumi K, Bergner S, Gropman A, Smith-Hicks C, Abbeduto L, Hazlett H, Doherty D, German K, DaWalt L, Neul J, Constantino J, Baldridge D, Srivastava S, Molholm S, Walkley S, Storch E, Samaco R, Cohen J, Shankar S, Piven J, Mahida S, Sveden A, Dies K, Riggs E, Savatt J, Minor B, Gurnett C, Payne P, Gupta A. Automated extraction of functional biomarkers of verbal and ambulatory ability from multi-institutional clinical notes using large language models. Journal of Neurodevelopmental Disorders 2025;17(1) View
- Nakaura T, Takamure H, Kobayashi N, Shiraishi K, Yoshida N, Nagayama Y, Uetani H, Kidoh M, Funama Y, Hirai T. Evaluating the Performance of Reasoning Large Language Models on Japanese Radiology Board Examination Questions. Academic Radiology 2025;32(8):4347 View
- Wang L, Li J, Zhuang B, Huang S, Fang M, Wang C, Li W, Zhang M, Gong S. Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis. Journal of Medical Internet Research 2025;27:e64486 View
- Göçer Gürok N, Öztürk S. The Performance of AI in Dermatology Exams: The Exam Success and Limits of ChatGPT. Journal of Cosmetic Dermatology 2025;24(5) View
- Wu H, Zerner T, Lee D, Court-Kowalski S, Devitt P, Palmer E. GPT-4 versus human authors in clinically complex MCQ creation: A blinded analysis of item quality. Medical Teacher 2025:1 View
- Wu Y, Wu Y, Chang Y, Yu C, Wu C, Sung W, Atoum I. Advancing medical AI: GPT-4 and GPT-4o surpass GPT-3.5 in Taiwanese medical licensing exams. PLOS One 2025;20(6):e0324841 View
- Hirosawa T, Yokose M, Sakamoto T, Harada Y, Tokumasu K, Mizuta K, Shimizu T. Utility of Generative Artificial Intelligence for Japanese Medical Interview Training: Randomized Crossover Pilot Study. JMIR Medical Education 2025;11:e77332 View
- Meyer B, Kfuri‐Rubens R, Schmidt G, Tariq M, Riedel C, Recker F, Riedel F, Kiechle M, Riedel M. Exploring the potential of AI‐powered applications for clinical decision‐making in gynecologic oncology. International Journal of Gynecology & Obstetrics 2025;171(2):698 View
- Amini M, Chang P, Davis R, Nguyen D, Dodge J, Phan J, Buxbaum J, Sahakian A. Comparing ChatGPT3.5 and Bard recommendations for colonoscopy intervals: Bridging the gap in healthcare settings. Endoscopy International Open 2025;13(CP) View
- Ługowski F, Babińska J, Ludwin A, Stanirowski P. Comparative analysis of ChatGPT 3.5 and ChatGPT 4 obstetric and gynecological knowledge. Scientific Reports 2025;15(1) View
- Stenseke J. Counter-productivity and suspicion: two arguments against talking about the AGI control problem. Philosophical Studies 2025 View
- Feitosa Filho H, Furtado J, Eulálio E, Ribeiro P, Paiva L, Correia M, Silva Júnior G. ChatGPT performance in answering medical residency questions in nephrology: a pilot study in Brazil. Brazilian Journal of Nephrology 2025;47(4) View
- Feitosa Filho H, Furtado J, Eulálio E, Ribeiro P, Paiva L, Correia M, Silva Júnior G. Desempenho do ChatGPT na resposta a questões de residência médica em Nefrologia: um estudo piloto no Brasil. Brazilian Journal of Nephrology 2025;47(4) View
- Mavrych V, Yousef E, Yaqinuddin A, Bolgova O. Large language models in medical education: a comparative cross-platform evaluation in answering histological questions. Medical Education Online 2025;30(1) View
- George R, Titus S, Gehring D. Nurse Educators' Concerns of GenAI in Education: Scoping Review of Technical Factors. Journal of Nursing Education 2025;64(8):503 View
- Nakaura T, Uetani H, Yoshida N, Kobayashi N, Nagayama Y, Kidoh M, Kuroda J, Mukasa A, Hirai T. Intra-axial primary brain tumor differentiation: comparing large language models on structured MRI reports vs. radiologists on images. European Radiology 2025 View
- Polat M, Odabaşı O. Scientific Creativity of Artificial Intelligence: Evaluation of Novel Research Ideas in Oral and Maxillofacial Surgery. HRU International Journal of Dentistry and Oral Research 2025;5(2):94 View
- Jaleel A, Aziz U, Farid G, Zahid Bashir M, Mirza T, Khizar Abbas S, Aslam S, Sikander R. Evaluating the Potential and Accuracy of ChatGPT-3.5 and 4.0 in Medical Licensing and In-Training Examinations: Systematic Review and Meta-Analysis. JMIR Medical Education 2025;11:e68070 View
- Saowaprut P, Wabina R, Yang J, Siriwat L. Performance of large language models on Thailand’s national medical licensing examination: a cross-sectional study. Journal of Educational Evaluation for Health Professions 2025;22:16 View
- Zubaer A, Granitzer M, Geschwind S, Graf Lambsdorff J, Voss D. GPT-4 shows comparable performance to human examiners in ranking open-text answers. Scientific Reports 2025;15(1) View
- Fan K, Gan J, Zou I, Kaladjiska M, Inguanez M, Garden G. Poor Performance of Large Language Models Based on the Diabetes and Endocrinology Specialty Certificate Examination of the United Kingdom. Cureus 2025 View
- Gaddis G. Artificial Intelligence and the practice of emergency medicine. Emergency Medical Service 2025;12(3):121 View
- Kasagga A, Sapkota A, Changaramkumarath G, Abucha J, Wollel M, Somannagari N, Husami M, Hailu K, Kasagga E. Performance of ChatGPT and Large Language Models on Medical Licensing Exams Worldwide: A Systematic Review and Network Meta-Analysis With Meta-Regression. Cureus 2025 View
- Nakaura T, Kobayashi N, Shiraishi K, Yoshida N, Nagayama Y, Uetani H, Kidoh M, Oda S, Funama Y, Hirai T. Large Language Model Cost and Performance: A Comprehensive Analysis in the Context of the Japan Radiology Board Examination. Journal of Computer Assisted Tomography 2025 View
- Chen Y, Wen B, Zulkernine F. A Multi-agent Summarization and Auto-evaluation (MASA) Framework for Medical Text: Development and Evaluation Study (Preprint). JMIR AI 2025 View
- Simoni J, Urtubia-Fernandez J, Mengual E, Simoni D, Royo M, Egaña-Yin D, Hertog O, López-Ortiz L, Muñoz-Tomás A, Santiago-Martínez P, Vahamaki A, Pereira J. Artificial intelligence in undergraduate medical education: an updated scoping review. BMC Medical Education 2025;25(1) View
- Punnen T, Shan K, Patel M, McCreary M, Tran D, Santoyo J, Burgess K, Moog T, Smith A, Okuda D. Diagnostic accuracy and bias in open access and subscription-based large language models for multiple sclerosis and neuromyelitis optica spectrum disorder. Intelligence-Based Medicine 2025;12:100314 View
- Li X, Li G, Zhao Y, Liang Y, Dong Y, Zhang J. Exploring and Comparing the Use of Large Language Models in Supporting Osteoporosis Health Consultations. Clinical Interventions in Aging 2025;Volume 20:2133 View
- Inojosa H, Ramezanzadeh A, Gasparovic-Curtini I, Wiest I, Kather J, Gilbert S, Ziemssen T. Education Research: Can Large Language Models Match MS Specialist Training?. Neurology Education 2025;4(4) View
