Published on in Vol 10 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/51523, first published .
Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard

Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard

Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard

Journals

  1. Saleem N, Mufti T, Sohail S, Madsen D. ChatGPT as an innovative heutagogical tool in medical education. Cogent Education 2024;11(1) View
  2. Meo S, Alotaibi M, Meo M, Meo M, Hamid M. Medical knowledge of ChatGPT in public health, infectious diseases, COVID-19 pandemic, and vaccines: multiple choice questions examination based performance. Frontiers in Public Health 2024;12 View
  3. Vaishya R, Iyengar K, Patralekh M, Botchu R, Shirodkar K, Jain V, Vaish A, Scarlat M. Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions—an observational study. International Orthopaedics 2024;48(8):1963 View
  4. Tepe M, Emekli E. Decoding medical jargon: The use of AI language models (ChatGPT-4, BARD, microsoft copilot) in radiology reports. Patient Education and Counseling 2024;126:108307 View
  5. Tepe M, Emekli E. Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy. Cureus 2024 View
  6. Kaneda Y, Tayuinosho A, Tomoyose R, Takita M, Hamaki T, Tanimoto T, Ozaki A. Evaluating ChatGPT's effectiveness and tendencies in Japanese internal medicine. Journal of Evaluation in Clinical Practice 2024;30(6):1017 View
  7. Qamar M, Yasmeen J, Pathak S, Sohail S, Madsen D, Rangarajan M. Big claims, low outcomes: fact checking ChatGPT’s efficacy in handling linguistic creativity and ambiguity. Cogent Arts & Humanities 2024;11(1) View
  8. Paul S, Govindaraj S, Jk J. ChatGPT Versus National Eligibility cum Entrance Test for Postgraduate (NEET PG). Cureus 2024 View
  9. Halford E, Webster A. Using chat GPT to evaluate police threats, risk and harm. International Journal of Law, Crime and Justice 2024;78:100686 View
  10. Kipp M. From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance. Information 2024;15(9):543 View
  11. Ayala-Chauvin M, Avilés-Castillo F. Optimizing Natural Language Processing: A Comparative Analysis of GPT-3.5, GPT-4, and GPT-4o. Data and Metadata 2024;3 View
  12. Ramgopal S, Varma S, Gorski J, Kester K, Shieh A, Suresh S. Evaluation of a Large Language Model on the American Academy of Pediatrics' PREP Emergency Medicine Question Bank. Pediatric Emergency Care 2024;40(12):871 View
  13. Gumilar K, Tan M. The promise and challenges of Artificial Intelligence-Large Language Models (AI-LLMs) in obstetric and gynecology. Majalah Obstetri & Ginekologi 2024;32(2):128 View
  14. Workman T, Ahmed A, Sheriff H, Raman V, Zhang S, Shao Y, Faselis C, Fonarow G, Zeng-Treitler Q. ChatGPT-4 extraction of heart failure symptoms and signs from electronic health records. Progress in Cardiovascular Diseases 2024;87:44 View
  15. Lone M, Sohail S, Rahman A, Najar A. AI in oncology: comparing the diagnostic and therapeutic potential of claude 3 opus and ChatGPT 4.0 in HNSCC management. European Archives of Oto-Rhino-Laryngology 2025;282(2):1121 View
  16. Zare S, Vafaeian S, Amini M, Farhadi K, Vali M, Golestani A. Comparing the performance of ChatGPT-3.5-Turbo, ChatGPT-4, and Google Bard with Iranian students in pre-internship comprehensive exams. Scientific Reports 2024;14(1) View
  17. Maraqa N, Samargandi R, Poichotte A, Berhouet J, Benhenneda R. Comparing performances of french orthopaedic surgery residents with the artificial intelligence ChatGPT-4/4o in the French diploma exams of orthopaedic and trauma surgery. Orthopaedics & Traumatology: Surgery & Research 2024:104080 View
  18. AlSamhori A, Alnaimat F. Artificial intelligence in writing and research: ethical implications and best practices. Central Asian Journal of Medical Hypotheses and Ethics 2024;5(4):259 View
  19. Maraqa N, Samargandi R, Poichotte A, Berhouet J, Benhenneda R. Comparaison des performances des internes français de chirurgie orthopédique et de l’intelligence artificielle ChatGPT-4/4o aux examens du diplôme d’études spécialisées de chirurgie orthopédique et traumatologique. Revue de Chirurgie Orthopédique et Traumatologique 2025 View
  20. Qiu Y, Liu C. Capable exam-taker and question-generator: the dual role of generative AI in medical education assessment. Global Medical Education 2025 View
  21. Kim J, Vajravelu B. Assessing the Current Limitations of Large Language Models in Advancing Health Care Education. JMIR Formative Research 2025;9:e51319 View
  22. Xiong Y, Zhan Z, Zhong C, Zeng W, Guo J, Tang W, Liu C. Evaluating the Performance of Large Language Models (LLMs) in Answering and Analysing the Chinese Dental Licensing Examination. European Journal of Dental Education 2025 View
  23. Ok F, Karip B, Temizsoy Korkmaz F. Evaluating the Performance of Large Language Models in Anatomy Education Advancing Anatomy Learning with ChatGPT-4o. European Journal of Therapeutics 2025;31(1):35 View
  24. Jongkind R, Elings E, Joukes E, Broens T, Leopold H, Wiesman F, Meinema J. Is your curriculum GenAI-proof? A method for GenAI impact assessment and a case study. MedEdPublish 2025;15:11 View
  25. Jain S, Chakraborty B, Agarwal A, Sharma R. Performance of Large Language Models (ChatGPT and Gemini Advanced) in Gastrointestinal Pathology and Clinical Review of Applications in Gastroenterology. Cureus 2025 View
  26. Elkin P, Mehta G, LeHouillier F, Resnick M, Mullin S, Tomlin C, Resendez S, Liu J, Nebeker J, Brown S. Semantic Clinical Artificial Intelligence vs Native Large Language Model Performance on the USMLE. JAMA Network Open 2025;8(4):e256359 View

Books/Policy Documents

  1. Palmer E, Barbieri W. Risks and Opportunities in Using Educational Technologies. View

Conference Proceedings

  1. Alfieri C, Ganesh S, Ge L, Shi J, Sadeh N. 2024 21st Annual International Conference on Privacy, Security and Trust (PST). “I was Diagnosed with …”: Sensitivity Detection and Rephrasing of Amazon Reviews with ChatGPT View