Search Articles

View query in Help articles search

Search Results (1 to 8 of 8 Results)

Download search results: CSV END BibTex RIS


Influence of Model Evolution and System Roles on ChatGPT’s Performance in Chinese Medical Licensing Exams: Comparative Study

Influence of Model Evolution and System Roles on ChatGPT’s Performance in Chinese Medical Licensing Exams: Comparative Study

These system roles influence the direction of Chat GPT’s answers and may affect its reliability. However, the impact of these system roles on Chat GPT’s performance in medical field has not yet been investigated. As a professional chatbot tool, Chat GPT uses sampling to predict the next token with varying distribution probabilities, ensuring responses are varied and natural in real-world applications.

Shuai Ming, Qingge Guo, Wenjun Cheng, Bo Lei

JMIR Med Educ 2024;10:e52784

Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study

Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study

While the percentage of correct answers for questions based on radiological images was relatively high, this percentage was low for questions based on graphs, such as physiological tests. In the English translation and prompts, the percentage of correct answers for questions based on radiological images was 51.5%, while that for questions based on graphs was 29.2%. Results for image-based questions discriminated according to the type of image.

Masao Noda, Takayoshi Ueno, Ryota Koshu, Yuji Takaso, Mari Dias Shimada, Chizu Saito, Hisashi Sugimoto, Hiroaki Fushiki, Makoto Ito, Akihiro Nomura, Tomokazu Yoshizaki

JMIR Med Educ 2024;10:e57054

Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study

Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study

The questions and correct answers of the 117th Japanese National Medical Licensing Examination are publicly available for download on the official website of the Ministry of Health, Labour and Welfare [16]. All the questions are in a format in which a specified number of choices, typically 1, are to be selected from 5 options. Of the questions that had images, 2 were officially excluded from scoring because they were either too difficult or inappropriate.

Takahiro Nakao, Soichiro Miki, Yuta Nakamura, Tomohiro Kikuchi, Yukihiro Nomura, Shouhei Hanaoka, Takeharu Yoshikawa, Osamu Abe

JMIR Med Educ 2024;10:e54393

A Generative Pretrained Transformer (GPT)–Powered Chatbot as a Simulated Patient to Practice History Taking: Prospective, Mixed Methods Study

A Generative Pretrained Transformer (GPT)–Powered Chatbot as a Simulated Patient to Practice History Taking: Prospective, Mixed Methods Study

Among answers that used explicit script information (n=578, 67.7%), 218 (37.7%) were “plausible, highly specific for the case,” 161 (27.9%) were “plausible, relevant for the case,” and 197 (34.1%) were “plausible, not case specific,” with a mere 2 (0.3%) answers being rather implausible and none very implausible.

Friederike Holderried, Christian Stegemann–Philipps, Lea Herschbach, Julia-Astrid Moldt, Andrew Nevins, Jan Griewatz, Martin Holderried, Anne Herrmann-Werner, Teresa Festl-Wietek, Moritz Mahling

JMIR Med Educ 2024;10:e53961

Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study

Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study

The difficulty level of each question was established based on the percentage of correct answers received by JAMEP. Questions with less than 41.0% correct answers were classified as hard, those with between 41.1% and 72.1% correct answers as normal, and those with more than 72.1% correct answers as easy. The exclusion criteria were questions with images that GPT-4 could not recognize (n=55), questions containing videos (n=22), or both (n=6). The final analysis included 137 questions.

Takashi Watari, Soshi Takagi, Kota Sakaguchi, Yuji Nishizaki, Taro Shimizu, Yu Yamamoto, Yasuharu Tokuda

JMIR Med Educ 2023;9:e52202

Assessing the Accuracy and Comprehensiveness of ChatGPT in Offering Clinical Guidance for Atopic Dermatitis and Acne Vulgaris

Assessing the Accuracy and Comprehensiveness of ChatGPT in Offering Clinical Guidance for Atopic Dermatitis and Acne Vulgaris

Across both diseases, 78% (50/64) of Chat GPT responses were correct but inadequate (score ≤2), with 45% (29/64) of answers being fully comprehensive (score 1). No responses were completely inaccurate (score 4). For AD and acne specifically, 88% (28/32) and 66% (21 of 32) of responses were correct but inadequate (score ≤2), and 53% (17/32) and 34% (11/32) were fully comprehensive (score 1), respectively. This broadly indicates acceptable performance of Chat GPT across both conditions.

Nehal Lakdawala, Leelakrishna Channa, Christian Gronbeck, Nikita Lakdawala, Gillian Weston, Brett Sloan, Hao Feng

JMIR Dermatol 2023;6:e50409