Published on in Vol 9 (2023)

This is a member publication of University of Toronto

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/50514, first published .
Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study

Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study

Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study

Journals

  1. Komasawa N, Yokohira M. Learner-Centered Experience-Based Medical Education in an AI-Driven Society: A Literature Review. Cureus 2023 View
  2. Tangadulrat P, Sono S, Tangtrakulwanich B. Using ChatGPT for Clinical Practice and Medical Education: Cross-Sectional Survey of Medical Students’ and Physicians’ Perceptions. JMIR Medical Education 2023;9:e50658 View
  3. Sallam M, Barakat M, Sallam M. A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence–Based Models in Health Care Education and Practice: Development Study Involving a Literature Review. Interactive Journal of Medical Research 2024;13:e54704 View
  4. Arslan B, Eyupoglu G, Korkut S, Turkdogan K, Altinbilek E. The accuracy of AI-assisted chatbots on the annual assessment test for emergency medicine residents. Journal of Medicine, Surgery, and Public Health 2024;3:100070 View
  5. Günay S, Öztürk A, Özerol H, Yiğit Y, Erenler A. Comparison of emergency medicine specialist, cardiologist, and chat-GPT in electrocardiography assessment. The American Journal of Emergency Medicine 2024;80:51 View
  6. Katz U, Cohen E, Shachar E, Somer J, Fink A, Morse E, Shreiber B, Wolf I. GPT versus Resident Physicians — A Benchmark Based on Official Board Scores. NEJM AI 2024;1(5) View
  7. Komasawa N. Transformative Landscape of Anesthesia Education: Simulation, AI Integration, and Learner-Centric Reforms: A Narrative Review. Anesthesia Research 2024;1(1):34 View
  8. Mousavi M, Shafiee S, Harley J, Cheung J, Abbasgholizadeh Rahimi S. Performance of generative pre-trained transformers (GPTs) in Certification Examination of the College of Family Physicians of Canada. Family Medicine and Community Health 2024;12(Suppl 1):e002626 View
  9. Peláez-Sánchez I, Velarde-Camaqui D, Glasserman-Morales L. The impact of large language models on higher education: exploring the connection between AI and Education 4.0. Frontiers in Education 2024;9 View
  10. Sabri H, Saleh M, Hazrati P, Merchant K, Misch J, Kumar P, Wang H, Barootchi S. Performance of three artificial intelligence (AI)‐based large language models in standardized testing; implications for AI‐assisted dental education. Journal of Periodontal Research 2024 View
  11. Akpan I, Kobara Y, Owolabi J, Akpan A, Offodile O. Conversational and generative artificial intelligence and human–chatbot interaction in education and research. International Transactions in Operational Research 2025;32(3):1251 View
  12. Goodings A, Kajitani S, Chhor A, Albakri A, Pastrak M, Kodancha M, Ives R, Lee Y, Kajitani K. Assessment of ChatGPT-4 in Family Medicine Board Examinations Using Advanced AI Learning and Analytical Methods: Observational Study. JMIR Medical Education 2024;10:e56128 View
  13. Kipp M. From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance. Information 2024;15(9):543 View
  14. Chow J, Li K. Ethical Considerations in Human-Centered AI: Advancing Oncology Chatbots Through Large Language Models. JMIR Bioinformatics and Biotechnology 2024;5:e64406 View
  15. Waldock W, Zhang J, Guni A, Nabeel A, Darzi A, Ashrafian H. The Accuracy and Capability of Artificial Intelligence Solutions in Health Care Examinations and Certificates: Systematic Review and Meta-Analysis. Journal of Medical Internet Research 2024;26:e56532 View
  16. Huang R, Benour A, Kemppainen J, Leung F. The future of AI clinicians: assessing the modern standard of chatbots and their approach to diagnostic uncertainty. BMC Medical Education 2024;24(1) View
  17. Du W, Jin X, Harris J, Brunetti A, Johnson E, Leung O, Li X, Walle S, Yu Q, Zhou X, Bian F, McKenzie K, Kanathanavanich M, Ozcelik Y, El-Sharkawy F, Koga S. Large language models in pathology: A comparative study of ChatGPT and Bard with pathology trainees on multiple-choice questions. Annals of Diagnostic Pathology 2024;73:152392 View
  18. Aster A, Laupichler M, Rockwell-Kollmann T, Masala G, Bala E, Raupach T. ChatGPT and Other Large Language Models in Medical Education — Scoping Literature Review. Medical Science Educator 2024 View
  19. Zare S, Vafaeian S, Amini M, Farhadi K, Vali M, Golestani A. Comparing the performance of ChatGPT-3.5-Turbo, ChatGPT-4, and Google Bard with Iranian students in pre-internship comprehensive exams. Scientific Reports 2024;14(1) View
  20. Lee J, Park S, Shin J, Cho B. Analyzing evaluation methods for large language models in the medical field: a scoping review. BMC Medical Informatics and Decision Making 2024;24(1) View