Search Articles

View query in Help articles search

Search Results (1 to 10 of 14 Results)

Download search results: CSV END BibTex RIS


Influence of Model Evolution and System Roles on ChatGPT’s Performance in Chinese Medical Licensing Exams: Comparative Study

Influence of Model Evolution and System Roles on ChatGPT’s Performance in Chinese Medical Licensing Exams: Comparative Study

Each candidate is presented with 600 questions, arranged in a slightly varied order, from the exam year’s question data set. According to Open AI’s introduction, Chat GPT’s responses are based on information available up to September 2021. Thus, we selected the CNMLE 2022 questions, which were purchased from a web-based bookstore [21], for our evaluation. This choice ensured that the questions had not been previously encountered and trained by the model.

Shuai Ming, Qingge Guo, Wenjun Cheng, Bo Lei

JMIR Med Educ 2024;10:e52784

Appraisal of ChatGPT’s Aptitude for Medical Education: Comparative Analysis With Third-Year Medical Students in a Pulmonology Examination

Appraisal of ChatGPT’s Aptitude for Medical Education: Comparative Analysis With Third-Year Medical Students in a Pulmonology Examination

Chat GPT also successfully passed the 2022 Italian Residency Admission National Exam, which consists solely of MCQs. Additionally, in the 2022 European Examination in Core Cardiology, Chat GPT answered over 60% of questions correctly, displaying consistency across various MCQs [25]. In this study, the discrepancy in Chat GPT’s performance across question formats may be attributed to the high difficulty level of these questions, even for third-year medical students.

Hela Cherif, Chirine Moussa, Abdel Mouhaymen Missaoui, Issam Salouage, Salma Mokaddem, Besma Dhahri

JMIR Med Educ 2024;10:e52818

Hospital Use of a Web-Based Clinical Knowledge Support System and In-Training Examination Performance Among Postgraduate Resident Physicians in Japan: Nationwide Observational Study

Hospital Use of a Web-Based Clinical Knowledge Support System and In-Training Examination Performance Among Postgraduate Resident Physicians in Japan: Nationwide Observational Study

In addition, as the GM-ITE is a voluntary examination, a bias toward highly motivated residents taking the exam may exist. Therefore, the generalizability of this study is not ensured. Second, causal relationships could not be guaranteed because the study design was cross-sectional. To control for selection bias and to assess causality, we believe that planning a randomized controlled trial targeting nationwide resident physicians is necessary.

Koshi Kataoka, Yuji Nishizaki, Taro Shimizu, Yu Yamamoto, Kiyoshi Shikino, Masanori Nojima, Kazuya Nagasaki, Sho Fukui, Sho Nishiguchi, Kohta Katayama, Masaru Kurihara, Rieko Ueda, Hiroyuki Kobayashi, Yasuharu Tokuda

JMIR Med Educ 2024;10:e52207

Authors’ Reply: “Evaluating GPT-4’s Cognitive Functions Through the Bloom Taxonomy: Insights and Clarifications”

Authors’ Reply: “Evaluating GPT-4’s Cognitive Functions Through the Bloom Taxonomy: Insights and Clarifications”

However, GPT-4’s material selection is far more complex than a flat-file database with simple mapping (unless the exam questions had been in the testing data, but this is not applicable in our case). Generative tools like GPT-4 have other weaknesses and strengths. For example, they may perform relatively poorly on pure memory-recall problems but excel in topics requiring subtlety and nuanced work.

Anne Herrmann-Werner, Teresa Festl-Wietek, Friederike Holderried, Lea Herschbach, Jan Griewatz, Ken Masters, Stephan Zipfel, Moritz Mahling

J Med Internet Res 2024;26:e57778

Assessing ChatGPT’s Mastery of Bloom’s Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study

Assessing ChatGPT’s Mastery of Bloom’s Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study

First, we assessed the performance of GPT-4 with a large set of psychosomatic medicine exam questions and compared the results to responses from a cohort of medical students, thereby providing human comparison and quality indicators. For a deeper understanding of the results, we used qualitative methods to comprehend the model’s performance and to assess the strengths and weaknesses of LLMs in relation to Bloom’s taxonomy.

Anne Herrmann-Werner, Teresa Festl-Wietek, Friederike Holderried, Lea Herschbach, Jan Griewatz, Ken Masters, Stephan Zipfel, Moritz Mahling

J Med Internet Res 2024;26:e52113

A Novel Evaluation Model for Assessing ChatGPT on Otolaryngology–Head and Neck Surgery Certification Examinations: Performance Study

A Novel Evaluation Model for Assessing ChatGPT on Otolaryngology–Head and Neck Surgery Certification Examinations: Performance Study

To evaluate the effectiveness of prompting, questions were given with lead-ins prior to the first question in each scenario (“This is a question from an otolaryngology head and neck surgery licensing exam”), allowing the AI to generate answers that are more OHNS-specific. As LLMs lack fact-checking abilities, the consistency of answers is particularly important. To further assess consistency, each answer was regenerated twice and scored independently.

Cai Long, Kayle Lowe, Jessica Zhang, André dos Santos, Alaa Alanazi, Daniel O'Brien, Erin D Wright, David Cote

JMIR Med Educ 2024;10:e49970