Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard

doi:10.2196/51523

Published on 21.Feb.2024 in Vol 10 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/51523, first published 02.Aug.2023.

Person in white shirt with blue lanyard using a tablet

Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard

Faiza Farhat¹

; Beenish Moalla Chaudhry²

; Mohammad Nadeem³

; Shahab Saquib Sohail⁴

; Dag Øivind Madsen⁵

Article Authors Cited by (51) Tweetations (4) Metrics

Journals

Saleem N, Mufti T, Sohail S, Madsen D. ChatGPT as an innovative heutagogical tool in medical education. Cogent Education 2024;11(1) View
Meo S, Alotaibi M, Meo M, Meo M, Hamid M. Medical knowledge of ChatGPT in public health, infectious diseases, COVID-19 pandemic, and vaccines: multiple choice questions examination based performance. Frontiers in Public Health 2024;12 View
Vaishya R, Iyengar K, Patralekh M, Botchu R, Shirodkar K, Jain V, Vaish A, Scarlat M. Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions—an observational study. International Orthopaedics 2024;48(8):1963 View
Tepe M, Emekli E. Decoding medical jargon: The use of AI language models (ChatGPT-4, BARD, microsoft copilot) in radiology reports. Patient Education and Counseling 2024;126:108307 View
Tepe M, Emekli E. Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy. Cureus 2024 View
Kaneda Y, Tayuinosho A, Tomoyose R, Takita M, Hamaki T, Tanimoto T, Ozaki A. Evaluating ChatGPT's effectiveness and tendencies in Japanese internal medicine. Journal of Evaluation in Clinical Practice 2024;30(6):1017 View
Qamar M, Yasmeen J, Pathak S, Sohail S, Madsen D, Rangarajan M. Big claims, low outcomes: fact checking ChatGPT’s efficacy in handling linguistic creativity and ambiguity. Cogent Arts & Humanities 2024;11(1) View
Paul S, Govindaraj S, Jk J. ChatGPT Versus National Eligibility cum Entrance Test for Postgraduate (NEET PG). Cureus 2024 View
Halford E, Webster A. Using chat GPT to evaluate police threats, risk and harm. International Journal of Law, Crime and Justice 2024;78:100686 View
Kipp M. From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance. Information 2024;15(9):543 View
Ayala-Chauvin M, Avilés-Castillo F. Optimizing Natural Language Processing: A Comparative Analysis of GPT-3.5, GPT-4, and GPT-4o. Data and Metadata 2024;3 View
Ramgopal S, Varma S, Gorski J, Kester K, Shieh A, Suresh S. Evaluation of a Large Language Model on the American Academy of Pediatrics' PREP Emergency Medicine Question Bank. Pediatric Emergency Care 2024;40(12):871 View
Gumilar K, Tan M. The promise and challenges of Artificial Intelligence-Large Language Models (AI-LLMs) in obstetric and gynecology. Majalah Obstetri & Ginekologi 2024;32(2):128 View
Workman T, Ahmed A, Sheriff H, Raman V, Zhang S, Shao Y, Faselis C, Fonarow G, Zeng-Treitler Q. ChatGPT-4 extraction of heart failure symptoms and signs from electronic health records. Progress in Cardiovascular Diseases 2024;87:44 View
Lone M, Sohail S, Rahman A, Najar A. AI in oncology: comparing the diagnostic and therapeutic potential of claude 3 opus and ChatGPT 4.0 in HNSCC management. European Archives of Oto-Rhino-Laryngology 2025;282(2):1121 View
Zare S, Vafaeian S, Amini M, Farhadi K, Vali M, Golestani A. Comparing the performance of ChatGPT-3.5-Turbo, ChatGPT-4, and Google Bard with Iranian students in pre-internship comprehensive exams. Scientific Reports 2024;14(1) View
Maraqa N, Samargandi R, Poichotte A, Berhouet J, Benhenneda R. Comparing performances of french orthopaedic surgery residents with the artificial intelligence ChatGPT-4/4o in the French diploma exams of orthopaedic and trauma surgery. Orthopaedics & Traumatology: Surgery & Research 2025;111(8):104080 View
AlSamhori A, Alnaimat F. Artificial intelligence in writing and research: ethical implications and best practices. Central Asian Journal of Medical Hypotheses and Ethics 2024;5(4):259 View
Maraqa N, Samargandi R, Poichotte A, Berhouet J, Benhenneda R. Comparaison des performances des internes français de chirurgie orthopédique et de l’intelligence artificielle ChatGPT-4/4o aux examens du diplôme d’études spécialisées de chirurgie orthopédique et traumatologique. Revue de Chirurgie Orthopédique et Traumatologique 2025 View
Qiu Y, Liu C. Capable exam-taker and question-generator: the dual role of generative AI in medical education assessment. Global Medical Education 2025;2(1):135 View
Kim J, Vajravelu B. Assessing the Current Limitations of Large Language Models in Advancing Health Care Education. JMIR Formative Research 2025;9:e51319 View
Xiong Y, Zhan Z, Zhong C, Zeng W, Guo J, Tang W, Liu C. Evaluating the Performance of Large Language Models (LLMs) in Answering and Analysing the Chinese Dental Licensing Examination. European Journal of Dental Education 2025;29(2):332 View
Ok F, Karip B, Temizsoy Korkmaz F. Evaluating the Performance of Large Language Models in Anatomy Education Advancing Anatomy Learning with ChatGPT-4o. European Journal of Therapeutics 2025;31(1):35 View
Jongkind R, Elings E, Joukes E, Broens T, Leopold H, Wiesman F, Meinema J. Is your curriculum GenAI-proof? A method for GenAI impact assessment and a case study. MedEdPublish 2025;15:11 View
Jain S, Chakraborty B, Agarwal A, Sharma R. Performance of Large Language Models (ChatGPT and Gemini Advanced) in Gastrointestinal Pathology and Clinical Review of Applications in Gastroenterology. Cureus 2025 View
Elkin P, Mehta G, LeHouillier F, Resnick M, Mullin S, Tomlin C, Resendez S, Liu J, Nebeker J, Brown S. Semantic Clinical Artificial Intelligence vs Native Large Language Model Performance on the USMLE. JAMA Network Open 2025;8(4):e256359 View
Wang L, Li J, Zhuang B, Huang S, Fang M, Wang C, Li W, Zhang M, Gong S. Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis. Journal of Medical Internet Research 2025;27:e64486 View
Alfieri C, Scoccia G, Ganesh S, Sadeh N. Applying large language models to sanitize self-disclosure in user-generated content. Applied Soft Computing 2025;181:113311 View
Meyer B, Kfuri‐Rubens R, Schmidt G, Tariq M, Riedel C, Recker F, Riedel F, Kiechle M, Riedel M. Exploring the potential of AI‐powered applications for clinical decision‐making in gynecologic oncology. International Journal of Gynecology & Obstetrics 2025;171(2):698 View
Yang H, Zhang X, Gao C, Shang W, Du K, Cai W, Wang D, Li T. Task-Oriented Dynamic Knowledge Distillation for Continuous Few-Shot Relation Extraction. Knowledge-Based Systems 2025;325:113914 View
Miranda J, Pereira-Silva R, Guichard J, Meneses J, Carreira A, Seixas D. Artificial Intelligence Outperforms Physicians in General Medical Knowledge, Except in the Paediatrics Domain: A Cross-Sectional Study. Bioengineering 2025;12(6):653 View
Qiang S, Zhang H, Liao Y, Zhang Y, Gu Y, Wang Y, Xu Z, Shi H, Han N, Yu H. Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study. Journal of Medical Internet Research 2025;27:e73226 View
Jaleel A, Aziz U, Farid G, Zahid Bashir M, Mirza T, Khizar Abbas S, Aslam S, Sikander R. Evaluating the Potential and Accuracy of ChatGPT-3.5 and 4.0 in Medical Licensing and In-Training Examinations: Systematic Review and Meta-Analysis. JMIR Medical Education 2025;11:e68070 View
Ignjatović A, Anđelković Apostolović M, Stevanović L, Radovanović P, Topalović M, Filipović T, Otašević S. ChatGPT’s progress over time: A longitudinal enhancing biostatistical problem-solving in medical education. Health Informatics Journal 2025;31(3) View
Bulut S. Performance of Generative Artificial Intelligence Models (GPT-4o, Gemini, Copilot) in YKS/TYT Exam: A Comparative Study. Bilişim Teknolojileri Dergisi 2025;18(4):283 View
Sorka M, Gorenshtein A, Aran D, Shelly S, Huynh P. A multi-agent approach to neurological clinical reasoning. PLOS Digital Health 2025;4(12):e0001106 View
Wang B, Zhang M, Wang Z, Yao K, Hao M, Wang J, Peng S, Zhu Y. Supporting postgraduate exam preparation with large language models: implications for traditional Chinese medicine education. Frontiers in Medicine 2026;12 View
Lian L, Luo X, Chipusu K, Ashraf M, Wong K, Zhang W. Large Language Models Evaluation of Medical Licensing Examination Using GPT-4.0, ERNIE Bot 4.0, and GPT-4o. Bioengineering 2026;13(1):113 View
El Natour D, Abou Alfa M, Chaaban A, Assi R, Dally T, Bou Dargham B. Performance of 5 AI Models on United States Medical Licensing Examination Step 1 Questions: Comparative Observational Study. JMIR AI 2026;5:e76928 View
Wang Q, Zou H, Zhang H, Huang Y, Tian J, Cheng W. A Survey on Medical Competence Evaluation Benchmarks for Large Language Models. Health Care Science 2026;5(1):4 View
Jongkind R, Elings E, Joukes E, Broens T, Leopold H, Wiesman F, Meinema J. Is your curriculum GenAI-proof? A method for GenAI impact assessment and a case study. MedEdPublish 2026;15:11 View
Khanal A, Chataut S, Neupane A, Raut S, Ghimire U, Rana S, Bhatta J. A Study on the Performance of SOTA LLMs on Nepalese IOE Entrance Examination. European Journal of Applied Science, Engineering and Technology 2026;4(2):1 View
Félix V, Ostos R, Mena L, Toral-Cruz H, Ochoa-Brust A, Velarde-Alvarado P, González-Potes A, Félix-Cuadras R, León-Borges J, Martínez-Peláez R. Quality Assessment of Generative AI in Cybersecurity Certification. Informatics 2026;13(4):53 View
Akyuz B, Sezer I, Demir A, Kacar C. Dual perspectives on large language models in rheumatology: physician-rated quality and patient-centered usability of GPT-4o versus DeepSeek-V3. Informatics for Health and Social Care 2026:1 View
Jongkind R, Elings E, Joukes E, Broens T, Leopold H, Wiesman F, Meinema J. Is your curriculum GenAI-proof? A method for GenAI impact assessment and a case study. MedEdPublish 2026;15:11 View
Meyer-Waarden L, Cloarec J, Ferreira M. Generative artificial intelligence for sustainable tourism: alleviating cognitive overload to foster well-being and eco-responsible behavior. Tourism Management 2026;117:105451 View

Books/Policy Documents

Palmer E, Barbieri W. Risks and Opportunities in Using Educational Technologies. View
Boettcher K. 2025 Yearbook Emerging Technologies in Learning. View

Conference Proceedings

Alfieri C, Ganesh S, Ge L, Shi J, Sadeh N. 2024 21st Annual International Conference on Privacy, Security and Trust (PST). “I was Diagnosed with …”: Sensitivity Detection and Rephrasing of Amazon Reviews with ChatGPT View
Karpinski O, Mcfetridge L, Ta M, Ashriem F, Singh A. 2025 IEEE World AI IoT Congress (AIIoT). Evaluating AI-Assisted Search vs. Traditional Keyword Search: A Comparative Study of ChatGPT and Google View
Pattanshetti R, Sidddanagoudra S, Chand S, S P, Hebbar R, Vaishnavi . 2025 International Conference on Biomedical Engineering and Sustainable Healthcare (ICBMESH). Assessing the Performance of Large Language Models on the Foreign Medical Graduate Examination (FMGE): Insights from GPT-4 Turbo, Gemini Advanced, and LLaMA 3.1 (70B) View

Citation

Please cite as:

Farhat F, Chaudhry BM, Nadeem M, Sohail SS, Madsen DØ
Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard
JMIR Med Educ 2024;10:e51523
doi: 10.2196/51523 PMID: 38381486 PMCID: 10918540

Export Metadata

END for: Endnote

BibTeX for: BibDesk, LaTeX

RIS for: RefMan, Procite, Endnote, RefWorks

Add this article to your Mendeley library

This paper is in the following e-collection/theme issue:

Artificial Intelligence (AI) in Medical Education (669) e-Learning and Digital Medical Education (1534) Natural Language Processing (1227) Theme Issue: ChatGPT and Generative Language Models in Medical Education (144) Generative Language Models Including ChatGPT (1419)

Download

Download PDF Download XML

Share Article

Share on Bluesky Share on Twitter Share on Facebook Share on LinkedIn