Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany

doi:10.2196/46482

Published on 04.Sep.2023 in Vol 9 (2023)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/46482, first published 13.Feb.2023.

Robots and a person studying at desks with papers, symbolizing AI and education.

Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany

Jonas Roos¹

; Adnan Kasapovic¹

; Tom Jansen¹

; Robert Kaczmarczyk²

Article Authors Cited by (82) Tweetations (5) Metrics

Journals

Tangadulrat P, Sono S, Tangtrakulwanich B. Using ChatGPT for Clinical Practice and Medical Education: Cross-Sectional Survey of Medical Students’ and Physicians’ Perceptions. JMIR Medical Education 2023;9:e50658 View
Knopp M, Warm E, Weber D, Kelleher M, Kinnear B, Schumacher D, Santen S, Mendonça E, Turner L. AI-Enabled Medical Education: Threads of Change, Promising Futures, and Risky Realities Across Four Potential Future Worlds. JMIR Medical Education 2023;9:e50373 View
Zhang Z, Zhang J, Duan L, Tan C. ChatGPT in dermatology: exploring the limited utility amidst the tech hype. Frontiers in Medicine 2024;10 View
Abdaljaleel M, Barakat M, Alsanafi M, Salim N, Abazid H, Malaeb D, Mohammed A, Hassan B, Wayyes A, Farhan S, Khatib S, Rahal M, Sahban A, Abdelaziz D, Mansour N, AlZayer R, Khalil R, Fekih-Romdhane F, Hallit R, Hallit S, Sallam M. A multinational study on the factors influencing university students’ attitudes and usage of ChatGPT. Scientific Reports 2024;14(1) View
Gordon M, Daniel M, Ajiboye A, Uraiby H, Xu N, Bartlett R, Hanson J, Haas M, Spadafore M, Grafton-Clarke C, Gasiea R, Michie C, Corral J, Kwan B, Dolmans D, Thammasitboon S. A scoping review of artificial intelligence in medical education: BEME Guide No. 84. Medical Teacher 2024;46(4):446 View
Rojas M, Rojas M, Burgess V, Toro-Pérez J, Salehi S. Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study. JMIR Medical Education 2024;10:e55048 View
Warrier A, Singh R, Haleem A, Zaki H, Eloy J. The Comparative Diagnostic Capability of Large Language Models in Otolaryngology. The Laryngoscope 2024;134(9):3997 View
Andreychenko A, Gusev A. Perspectives on the application of large language models in healthcare. National Health Care (Russia) 2024;4(4):48 View
Moulaei K, Yadegari A, Baharestani M, Farzanbakhsh S, Sabet B, Reza Afrash M. Generative artificial intelligence in healthcare: A scoping review on benefits, challenges and applications. International Journal of Medical Informatics 2024;188:105474 View
Bharatha A, Ojeh N, Fazle Rabbi A, Campbell M, Krishnamurthy K, Layne-Yarde R, Kumar A, Springer D, Connell K, Majumder M. Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom’s Taxonomy. Advances in Medical Education and Practice 2024;Volume 15:393 View
Griewing S, Knitza J, Boekhoff J, Hillen C, Lechner F, Wagner U, Wallwiener M, Kuhn S. Evolution of publicly available large language models for complex decision-making in breast cancer care. Archives of Gynecology and Obstetrics 2024;310(1):537 View
Zengin A, Ulfanov O, Bag Y, Ulas M. Artificial Intelligence Versus Medical Students in General Surgery Exam. Indian Journal of Surgery 2025;87(1):68 View
Liu M, Okuhara T, Chang X, Shirabe R, Nishiie Y, Okada H, Kiuchi T. Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis. Journal of Medical Internet Research 2024;26:e60807 View
Rossettini G, Rodeghiero L, Corradi F, Cook C, Pillastrini P, Turolla A, Castellini G, Chiappinotto S, Gianola S, Palese A. Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study. BMC Medical Education 2024;24(1) View
Patel E, Fleischer L, Filip P, Eggerstedt M, Hutz M, Michaelides E, Batra P, Tajudeen B. Comparative Performance of ChatGPT 3.5 and GPT4 on Rhinology Standardized Board Examination Questions. OTO Open 2024;8(2) View
Li K, Fernandez A, Schwartz R, Rios N, Carlisle M, Amend G, Patel H, Breyer B. Comparing GPT-4 and Human Researchers in Health Care Data Analysis: Qualitative Description Study. Journal of Medical Internet Research 2024;26:e56500 View
Giray L, Aquino R. Use and impact of ChatGPT on undergraduate engineering students: A case from the Philippines. Internet Reference Services Quarterly 2024;28(4):453 View
Aljamaan F, Temsah M, Altamimi I, Al-Eyadhy A, Jamal A, Alhasan K, Mesallam T, Farahat M, Malki K. Reference Hallucination Score for Medical Artificial Intelligence Chatbots: Development and Usability Study. JMIR Medical Informatics 2024;12:e54345 View
Jeong J, Gil D, Kim D, Jeong J. Current Research and Future Directions for Off-Site Construction through LangChain with a Large Language Model. Buildings 2024;14(8):2374 View
Suresh S, Misra S. Large Language Models in Pediatric Education: Current Uses and Future Potential. Pediatrics 2024;154(3) View
Wang Y, Liang L, Li R, Wang Y, Hao C. Comparison of the Performance of ChatGPT, Claude and Bard in Support of Myopia Prevention and Control. Journal of Multidisciplinary Healthcare 2024;Volume 17:3917 View
Alnaim N, AlSanad D, Albelali S, Almulhem M, Almuhanna A, Attar R, Alsahli M, Albagmi S, Bakhshwain A, Almazrou S, Almutairi S, AboAlsamh H, Arif W, Alsadhan A, Alsedrah I, Alanezi F, Alibrahim D, Alqahtani N. Effectiveness of ChatGPT in remote learning environments: An empirical study with medical students in Saudi Arabia. Nutrition and Health 2025;31(3):1035 View
Al-Naser Y, Halka F, Ng B, Mountford D, Sharma S, Niure K, Yong-Hing C, Khosa F, Van der Pol C. Evaluating Artificial Intelligence Competency in Education: Performance of ChatGPT-4 in the American Registry of Radiologic Technologists (ARRT) Radiography Certification Exam. Academic Radiology 2025;32(2):597 View
Fraga-Sastrías J, Navarrini H, Silva-Brehuer M, Espejo-González R, Olvera-Cortés H, Rubio-Martínez R. Uso de Chat-GPT para la generación y conducción de escenarios simulados para el aprendizaje de habilidades no técnicas. Revista Latinoamericana de Simulación Clínica 2024;6(2):64 View
Armbruster J, Bussmann F, Rothhaas C, Titze N, Grützner P, Freischmidt H. “Doctor ChatGPT, Can You Help Me?” The Patient’s Perspective: Cross-Sectional Study. Journal of Medical Internet Research 2024;26:e58831 View
Wu J, Nishida T, Liu T. Accuracy of large language models in answering ophthalmology board-style questions: A meta-analysis. Asia-Pacific Journal of Ophthalmology 2024;13(5):100106 View
Abdul Sami M, Abdul Samad M, Parekh K, Suthar P. Comparative Accuracy of ChatGPT 4.0 and Google Gemini in Answering Pediatric Radiology Text-Based Questions. Cureus 2024 View
Sallam M, Al-Mahzoum K, Almutairi Y, Alaqeel O, Abu Salami A, Almutairi Z, Alsarraf A, Barakat M. Anxiety among Medical Students Regarding Generative Artificial Intelligence Models: A Pilot Descriptive Study. International Medical Education 2024;3(4):406 View
Le K, Chen J, Mai D, Le K. An Evaluation on the Potential of Large Language Models for Use in Trauma Triage. Emergency Care and Medicine 2024;1(4):350 View
Liu M, Okuhara T, Chang X, Okada H, Kiuchi T, Khlaif Z. Performance of ChatGPT in medical licensing examinations in countries worldwide: A systematic review and meta-analysis protocol. PLOS ONE 2024;19(10):e0312771 View
Harigai A, Toyama Y, Nagano M, Abe M, Kawabata M, Li L, Yamamura J, Takase K. Response accuracy of GPT-4 across languages: insights from an expert-level diagnostic radiology examination in Japan. Japanese Journal of Radiology 2025;43(2):319 View
Cotohuanca Cruz S, Arredondo-Zela S, Grández-Ventura L. Uso del ChatGPT y el rendimiento académico en estudiantes de una Universidad Privada. REVISTA EDUSER 2024;11(1):29 View
Alli S, Hossain S, Das S, Upshur R. The Potential of Artificial Intelligence Tools for Reducing Uncertainty in Medicine and Directions for Medical Education. JMIR Medical Education 2024;10:e51446 View
Aster A, Laupichler M, Rockwell-Kollmann T, Masala G, Bala E, Raupach T. ChatGPT and Other Large Language Models in Medical Education — Scoping Literature Review. Medical Science Educator 2024;35(1):555 View
Liu M, Okuhara T, Huang W, Ogihara A, Nagao H, Okada H, Kiuchi T. Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis. International Dental Journal 2025;75(1):213 View
Chen R, Zeng D, Li Y, Huang R, Sun D, Li T. Evaluating the performance and clinical decision‐making impact of ChatGPT‐4 in reproductive medicine. International Journal of Gynecology & Obstetrics 2025;168(3):1285 View
Lee J, Park S, Shin J, Cho B. Analyzing evaluation methods for large language models in the medical field: a scoping review. BMC Medical Informatics and Decision Making 2024;24(1) View
Roos J, Wilhelm T, Martin R, Kaczmarczyk R. From Language Models to Medical Diagnoses: Assessing the Potential of GPT-4 and GPT-3.5-Turbo in Digital Health. AI 2024;5(4):2680 View
Zong H, Wu R, Cha J, Wang J, Wu E, Li J, Zhou Y, Zhang C, Feng W, Shen B. Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis. Journal of Medical Internet Research 2024;26:e66114 View
Roos J, Martin R, Kaczmarczyk R. Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study. JMIR Formative Research 2024;8:e57592 View
Sabaner M, Anguita R, Antaki F, Balas M, Boberg-Ans L, Ferro Desideri L, Grauslund J, Hansen M, Klefter O, Potapenko I, Rasmussen M, Subhi Y. Opportunities and Challenges of Chatbots in Ophthalmology: A Narrative Review. Journal of Personalized Medicine 2024;14(12):1165 View
Özkan T, Acar A, Özkan E, Düzyol M, Öztürk E. Are artificial intelligence based chatbots reliable sources for patients regarding orthodontics?. APOS Trends in Orthodontics 2025;15:141 View
Qiu Y, Liu C. Capable exam-taker and question-generator: the dual role of generative AI in medical education assessment. Global Medical Education 2025;2(1):135 View
Ajalo E, Mukunya D, Nantale R, Kayemba F, Pangholi K, Babuya J, Langoya Akuu S, Namiiro A, Nsubuga Y, Mpagi J, Musaba M, Oguttu F, Kuteesa J, Mubuuke A, Munabi I, Kiguli S, Omara T. Widespread use of ChatGPT and other Artificial Intelligence tools among medical students in Uganda: A cross-sectional study. PLOS ONE 2025;20(1):e0313776 View
Feigerlova E, Hani H, Hothersall-Davies E. A systematic review of the impact of artificial intelligence on educational outcomes in health professions education. BMC Medical Education 2025;25(1) View
Nordquist J, Silva S, Caverzagie K, Hall J. Clinical learning environments: Updates. Medical Teacher 2025;47(6):911 View
Erdat E, Kavak E. Benchmarking LLM chatbots’ oncological knowledge with the Turkish Society of Medical Oncology’s annual board examination questions. BMC Cancer 2025;25(1) View
Salman I, Ameer O, Khanfar M, Hsieh Y. Artificial intelligence in healthcare education: evaluating the accuracy of ChatGPT, Copilot, and Google Gemini in cardiovascular pharmacology. Frontiers in Medicine 2025;12 View
Murthy A, Palaniappan V, Radhakrishnan S, Rajaa S, Karthikeyan K. A Comparative Analysis of the Performance of Large Language Models and Human Respondents in Dermatology. Indian Dermatology Online Journal 2025;16(2):241 View
Bolgova O, Shypilova I, Mavrych V. Large Language Models in Biochemistry Education: Comparative Evaluation of Performance. JMIR Medical Education 2025;11:e67244 View
Prazeres F. ChatGPT’s Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini. JMIR Medical Education 2025;11:e65108 View
Buhl L. The answer may vary: large language model response patterns challenge their use in test item analysis. Medical Teacher 2025;47(11):1761 View
Yitzhaki S, Peled N, Kaplan E, Kadmon G, Nahum E, Gendler Y, Weissbach A. Comparing ChatGPT‐4 and a Paediatric Intensive Care Specialist in Responding to Medical Education Questions: A Multicenter Evaluation. Journal of Paediatrics and Child Health 2025;61(7):1084 View
Mustață M, Iliescu D, Mavriș E, Jude C, Bojor L, Tudorache P, Cîrdei I, Hrab D, Aluculesei A, Răpan I, Dan-Șuteu Ş, Roman D, Urseiu C. ChatGPT-Assisted Decision-Making: An In-Depth Exploration of the Human–AI Interaction. International Journal of Human–Computer Interaction 2025;41(24):15584 View
Wang L, Li J, Zhuang B, Huang S, Fang M, Wang C, Li W, Zhang M, Gong S. Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis. Journal of Medical Internet Research 2025;27:e64486 View
Li Z, Yan C, Cao Y, Gong A, Li F, Zeng R. Evaluating performance of large language models for atrial fibrillation management using different prompting strategies and languages. Scientific Reports 2025;15(1) View
Cheng E. Leveraging generative AI in science lesson study: transforming density concept instruction through ChatGPT integration. International Journal for Lesson & Learning Studies 2025;14(3):215 View
Bessa R, de Oliveira A, Bessa R, Sousa D, Alves R, Barbosa A, Carneiro A, Soares C, Teles A. Performance Comparison of Large Language Models on Brazil’s Medical Revalidation Exam for Foreign-Trained Graduates. Applied Sciences 2025;15(13):7134 View
Masur L, Driller M, Suppiah H, Matzka M, Sperlich B, Düking P. Assessment of Recommendations Provided to Athletes Regarding Sleep Education by GPT-4o and Google Gemini: Comparative Evaluation Study. JMIR Formative Research 2025;9:e71358 View
Qiang S, Zhang H, Liao Y, Zhang Y, Gu Y, Wang Y, Xu Z, Shi H, Han N, Yu H. Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study. Journal of Medical Internet Research 2025;27:e73226 View
Hosseinzadegan F, Hashemi S, Sharifi S, Khoshnoodifar M. ChatGPT Utility in Medical Education: A Systematic Review and Meta-Analysis. Health Science Monitor 2025;4(3):173 View
Kasagga A, Sapkota A, Changaramkumarath G, Abucha J, Wollel M, Somannagari N, Husami M, Hailu K, Kasagga E. Performance of ChatGPT and Large Language Models on Medical Licensing Exams Worldwide: A Systematic Review and Network Meta-Analysis With Meta-Regression. Cureus 2025 View
Alzarea A, Ishaqui A, Maqsood M, Alanazi A, Alsaidan A, Mallhi T, Kumar N, Imran M, Alshahrani S, Alhassan H, Alzarea S, Alsaidan O. Evaluating AI performance in infectious disease education: a comparative analysis of ChatGPT, Google Bard, Perplexity AI, Microsoft Copilot, and Meta AI. Frontiers in Medicine 2025;12 View
Wang H, Shan W, Liu R, Wang Z. Can large language models serve as digital assistants for medical undergraduates? – A bibliometric mapping and scoping analysis of the medical-education literature. DIGITAL HEALTH 2025;11 View
Pornwattanakavee S, Leelakanok N, Todsarot T, Guinto G, Takun R, Sumativit A, Senngam M. Effectiveness of ChatGPT, Google Gemini, and Microsoft Copilot in Answering Thai Drug Information Queries: Cross-Sectional Study. JMIR AI 2025;4:e79751 View
Izquierdo-Condoy J, Arias-Intriago M, Tello-De-la-Torre A, Busch F, Ortiz-Prado E. Generative Artificial Intelligence in Medical Education: Enhancing Critical Thinking or Undermining Cognitive Autonomy?. Journal of Medical Internet Research 2025;27:e76340 View
Simoni J, Urtubia-Fernandez J, Mengual E, Simoni D, Royo M, Egaña-Yin D, Hertog O, López-Ortiz L, Muñoz-Tomás A, Santiago-Martínez P, Vahamaki A, Pereira J. Artificial intelligence in undergraduate medical education: an updated scoping review. BMC Medical Education 2025;25(1) View
Yu Z, Cheng W, Li S. The double‐edged sword of AI in medical education. Medical Education 2026;60(7):822 View
Al-Rahahleh A, Rizik M, Al-Ashwal F, Abu-Farha R, Zawiah M. Diagnostic performance of four AI tools in pharmacology MCQs: Accuracy, sensitivity, and specificity. PLOS One 2025;20(12):e0337688 View
Ignjatović A, Apostolović M, Stevanović L, Radovanović P, Sidharth , Topalović M, Filipović T. Exploring Medical Students’ Perceptions Regarding ChatGPT and AI Studying at the University of Niš: A Study on Usage, Attitudes, and Linguistic Influence—Single-Centered Study in Serbia—A Paradoxical Ally?. Journal of Medical Education and Curricular Development 2025;12 View
Jing Z, Mouhong Z. Exploring the integration of AI and national quality courses in China: a study on teaching practices in nursing smart education. BMC Medical Education 2025;25(1) View
Strasser L, Anschuetz W, Dennstädt F, Hastings J. Performance Evaluation of Large Language Models in Multilingual Medical Multiple-Choice Questions: Mixed Methods Study. JMIR Medical Education 2026;12:e81399 View
Alkabazi M, Tassoker M. Comparative performance of AI models on case-based oral medicine questions across Bloom’s taxonomy levels and subtopics. Odontology 2026 View
Guirguis M, Fotsing S, Fevry J, Landry C, Bouchard-Lamothe D, Lacroix J, Jalali A. Artificial Intelligence in Health Professions Education: Qualitative Study of Student Experiences. Journal of Medical Internet Research 2026;28:e82432 View
Hong D, Huang C, Gao J. Comparative performance of ChatGPT-5 and DeepSeek on the Chinese ultrasound medicine senior professional title examination. Frontiers in Digital Health 2026;8 View
Ren K, Weng Q, Chen Q, Li H, Xie D, Zeng C, Wei J, Lei G, Wang Y. The application of large language models in orthopedic postgraduate education: potentials, challenges, and future prospects. Journal of Orthopaedic Surgery and Research 2026;21(1) View
Festl-Wietek T, Schröpel C, Holderried F, Herrmann B, Griewatz J, Ehehalt S, Junne F, Zipfel S, Herrmann-Werner A, Erschens R. Communication-Based Teaching on Childhood Obesity and the Planetary Health Diet in Medical Education: Proof-of-Concept Study Comparing 4 Information Sources. JMIR Formative Research 2026;10:e92644 View
Kurt Ş, Bahadırlı S. Performance Evaluation of Large Language Models in Emergency Medicine Specialty Examination Questions: A Cross-Sectional Study. Istanbul Medical Journal 2026 View
Güler I, Grieb G, Kraus A, Moog P, Cambaz U, Yavasca E, Stelling H. Artificial Intelligence in Medical Assessment: Reliability and Performance of Multimodal Large Language Models in a High-Stakes Licensing Examination. Behavioral Sciences 2026;16(5):822 View

Books/Policy Documents

Cheng E. AI Roles and Responsibilities in Education. View

Conference Proceedings

Dong B, Bai J, Xu T, Zhou Y. 2024 6th International Conference on Computer Science and Technologies in Education (CSTE). Large Language Models in Education: A Systematic Review View
Guo Z. Proceedings of the 2nd International Conference on Intelligent Education and Computer Technology. Artificial Intelligence Empowered Literature Retrieval: Acquisition and Analysis of Scientific Information in Medical Basic Research under the New Medical Paradigm View

Citation

Please cite as:

Roos J, Kasapovic A, Jansen T, Kaczmarczyk R
Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany
JMIR Med Educ 2023;9:e46482
doi: 10.2196/46482 PMID: 37665620 PMCID: 10507517

Export Metadata

END for: Endnote

BibTeX for: BibDesk, LaTeX

RIS for: RefMan, Procite, Endnote, RefWorks

Add this article to your Mendeley library

This paper is in the following e-collection/theme issue:

Theme Issue: ChatGPT and Generative Language Models in Medical Education (144) Chatbots and Conversational Agents (1135) Artificial Intelligence (AI) in Medical Education (669) Generative Language Models Including ChatGPT (1419)

Download

Download PDF Download XML

Share Article

Share on Bluesky Share on Twitter Share on Facebook Share on LinkedIn