Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study

doi:10.2196/50514

Journals

Komasawa N, Yokohira M. Learner-Centered Experience-Based Medical Education in an AI-Driven Society: A Literature Review. Cureus 2023 View
Tangadulrat P, Sono S, Tangtrakulwanich B. Using ChatGPT for Clinical Practice and Medical Education: Cross-Sectional Survey of Medical Students’ and Physicians’ Perceptions. JMIR Medical Education 2023;9:e50658 View
Sallam M, Barakat M, Sallam M. A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence–Based Models in Health Care Education and Practice: Development Study Involving a Literature Review. Interactive Journal of Medical Research 2024;13:e54704 View
Arslan B, Eyupoglu G, Korkut S, Turkdogan K, Altinbilek E. The accuracy of AI-assisted chatbots on the annual assessment test for emergency medicine residents. Journal of Medicine, Surgery, and Public Health 2024;3:100070 View
Günay S, Öztürk A, Özerol H, Yiğit Y, Erenler A. Comparison of emergency medicine specialist, cardiologist, and chat-GPT in electrocardiography assessment. The American Journal of Emergency Medicine 2024;80:51 View
Katz U, Cohen E, Shachar E, Somer J, Fink A, Morse E, Shreiber B, Wolf I. GPT versus Resident Physicians — A Benchmark Based on Official Board Scores. NEJM AI 2024;1(5) View
Komasawa N. Transformative Landscape of Anesthesia Education: Simulation, AI Integration, and Learner-Centric Reforms: A Narrative Review. Anesthesia Research 2024;1(1):34 View
Mousavi M, Shafiee S, Harley J, Cheung J, Abbasgholizadeh Rahimi S. Performance of generative pre-trained transformers (GPTs) in Certification Examination of the College of Family Physicians of Canada. Family Medicine and Community Health 2024;12(Suppl 1):e002626 View
Peláez-Sánchez I, Velarde-Camaqui D, Glasserman-Morales L. The impact of large language models on higher education: exploring the connection between AI and Education 4.0. Frontiers in Education 2024;9 View
Sabri H, Saleh M, Hazrati P, Merchant K, Misch J, Kumar P, Wang H, Barootchi S. Performance of three artificial intelligence (AI)‐based large language models in standardized testing; implications for AI‐assisted dental education. Journal of Periodontal Research 2025;60(2):121 View
Akpan I, Kobara Y, Owolabi J, Akpan A, Offodile O. Conversational and generative artificial intelligence and human–chatbot interaction in education and research. International Transactions in Operational Research 2025;32(3):1251 View
Goodings A, Kajitani S, Chhor A, Albakri A, Pastrak M, Kodancha M, Ives R, Lee Y, Kajitani K. Assessment of ChatGPT-4 in Family Medicine Board Examinations Using Advanced AI Learning and Analytical Methods: Observational Study. JMIR Medical Education 2024;10:e56128 View
Kipp M. From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance. Information 2024;15(9):543 View
Chow J, Li K. Ethical Considerations in Human-Centered AI: Advancing Oncology Chatbots Through Large Language Models. JMIR Bioinformatics and Biotechnology 2024;5:e64406 View
Waldock W, Zhang J, Guni A, Nabeel A, Darzi A, Ashrafian H. The Accuracy and Capability of Artificial Intelligence Solutions in Health Care Examinations and Certificates: Systematic Review and Meta-Analysis. Journal of Medical Internet Research 2024;26:e56532 View
Huang R, Benour A, Kemppainen J, Leung F. The future of AI clinicians: assessing the modern standard of chatbots and their approach to diagnostic uncertainty. BMC Medical Education 2024;24(1) View
Du W, Jin X, Harris J, Brunetti A, Johnson E, Leung O, Li X, Walle S, Yu Q, Zhou X, Bian F, McKenzie K, Kanathanavanich M, Ozcelik Y, El-Sharkawy F, Koga S. Large language models in pathology: A comparative study of ChatGPT and Bard with pathology trainees on multiple-choice questions. Annals of Diagnostic Pathology 2024;73:152392 View
Aster A, Laupichler M, Rockwell-Kollmann T, Masala G, Bala E, Raupach T. ChatGPT and Other Large Language Models in Medical Education — Scoping Literature Review. Medical Science Educator 2024;35(1):555 View
Zare S, Vafaeian S, Amini M, Farhadi K, Vali M, Golestani A. Comparing the performance of ChatGPT-3.5-Turbo, ChatGPT-4, and Google Bard with Iranian students in pre-internship comprehensive exams. Scientific Reports 2024;14(1) View
Lee J, Park S, Shin J, Cho B. Analyzing evaluation methods for large language models in the medical field: a scoping review. BMC Medical Informatics and Decision Making 2024;24(1) View
Lin J, Hua Z, Zhang L, Lin Y, Ding Y, Chen X, Li S, Wang Y, Li Q. A narrative review of applications and enhancements of ChatGPT in respiratory medicine. Clinical eHealth 2024;7:200 View
Bedel H, Bedel C, Selvi F, Zortuk Ö, Karanci Y. Emergency Medicine Assistants in the Field of Toxicology, Comparison of ChatGPT-3.5 and GEMINI Artificial Intelligence Systems. Acta medica Lituanica 2024;31(2):294 View
Prazeres F. ChatGPT’s Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini. JMIR Medical Education 2025;11:e65108 View
Ye H, Xu J, Huang D, Xie M, Guo J, Yang J, Bao H, Zhang M, Zheng C. Assessment of large language models’ performances and hallucinations for Chinese postgraduate medical entrance examination. Discover Education 2025;4(1) View
Mavrych V, Yaqinuddin A, Bolgova O. Claude, ChatGPT, Copilot, and Gemini performance versus students in different topics of neuroscience. Advances in Physiology Education 2025;49(2):430 View
Matos T, Santos W, Zdravevski E, Coelho P, Pires I, Madeira F. A systematic review of artificial intelligence applications in education: Emerging trends and challenges. Decision Analytics Journal 2025;15:100571 View
Najafali D, Reiche E, Araya S, Orellana M, Liu F, Camacho J, Patel S, Broyles J, Dorafshar A, Morrison S, Knoedler L, Fox P. Artificial Intelligence Augmentation: Performance of GPT-4 and GPT-3.5 on the Plastic Surgery In-service Examination. Plastic and Reconstructive Surgery - Global Open 2025;13(4):e6645 View
Yitzhaki S, Peled N, Kaplan E, Kadmon G, Nahum E, Gendler Y, Weissbach A. Comparing ChatGPT‐4 and a Paediatric Intensive Care Specialist in Responding to Medical Education Questions: A Multicenter Evaluation. Journal of Paediatrics and Child Health 2025;61(7):1084 View
Wang L, Li J, Zhuang B, Huang S, Fang M, Wang C, Li W, Zhang M, Gong S. Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis. Journal of Medical Internet Research 2025;27:e64486 View
Huang R, Sood T, Nelms M, Wintraub L, Leung F. Addressing the challenges of field notes in medical education: a qualitative study of resident experiences. BMC Medical Education 2025;25(1) View
Souto M, Fernandes A, Silva A, de Freitas Ribeiro L, de Medeiros Fernandes T. A multi-model longitudinal assessment of ChatGPT performance on medical residency examinations. Frontiers in Artificial Intelligence 2025;8 View
Jaleel A, Aziz U, Farid G, Zahid Bashir M, Mirza T, Khizar Abbas S, Aslam S, Sikander R. Evaluating the Potential and Accuracy of ChatGPT-3.5 and 4.0 in Medical Licensing and In-Training Examinations: Systematic Review and Meta-Analysis. JMIR Medical Education 2025;11:e68070 View
Lin Y, Luo Z, Ye Z, Zhong N, Zhao L, Zhang L, Li X, Chen Z, Chen Y. Applications, Challenges, and Prospects of Generative Artificial Intelligence Empowering Medical Education: Scoping Review. JMIR Medical Education 2025;11:e71125 View
Warlick A, Clifton C, Trinh T, Kaur R, Weinberg A, Collins J. Integrating a chatbot into simulation-based perfusion training: A pilot randomized controlled trial. Perfusion 2025 View
Abbas A, Azar B, Mahrishi M, Martín-Núñez J, Mishra D. AI governance in higher education: A meta-analytic thematic review of current research trends, policy initiatives and knowledge gaps. Equilibrium. Quarterly Journal of Economics and Economic Policy 2025;20(4):1257 View
Lišnić B, Gaurina M. Large Language Models in Physics: Analysis of Accuracy and Teacher Perception. Interdisciplinary Description of Complex Systems 2025;23(6):668 View
Khan W, Leem S, See K, Wong J, Zhang S, Fang R. A Comprehensive Survey of Foundation Models in Medicine. IEEE Reviews in Biomedical Engineering 2026;19:283 View
Hussain A, Farwa U, Ali S, Kim H. The Rise of Foundation Models: Opportunities, Technology, Applications, Challenges, Recent Trends, and Future Directions. Applied System Innovation 2026;9(2):35 View
Hasanien A, Albusoul R. Comparing nursing students and AI systems performance on the ability to solve basic life support and advanced cardiac life support multiple choice exam questions. Nurse Education in Practice 2026;92:104772 View

Books/Policy Documents

Bakthavatchaalam V, Sivasankar K. Transforming Healthcare Sector Through Artificial Intelligence and Environmental Sustainability. View

Conference Proceedings

S V, Seshasai , Joseph L. 2024 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES). A Hybrid Deep Learning Algorithm for Improved ChatBot Accuracy and Relevance Through Advanced Retrieval-Augmented Generation View
Liu Z, Hu L, Zhou T, Tang Y, Cai Z. 2025 IEEE Symposium on Security and Privacy (SP). Prevalence Overshadows Concerns? Understanding Chinese Users' Privacy Awareness and Expectations Towards LLM-Based Healthcare Consultation View

This paper is in the following e-collection/theme issue:

Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study

Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study

Journals

Books/Policy Documents

Conference Proceedings