https://mededu.jmir.org/issue/feedJMIR Medical Education2023-01-06T11:15:24-05:00JMIR Publicationseditor@jmir.orgOpen Journal Systems This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published JMIR Medical Education, is properly cited. The complete bibliographic information, a link to the original publication on https://mededu.jmir.org/, as well as this copyright and license information must be included. Technology, innovation, and openness in medical education in the information age. https://mededu.jmir.org/2024/1/e54393/ Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study2024-03-12T09:45:04-04:00Takahiro NakaoSoichiro MikiYuta NakamuraTomohiro KikuchiYukihiro NomuraShouhei HanaokaTakeharu YoshikawaOsamu Abe<strong>Background:</strong> Previous research applying large language models (LLMs) to medicine was focused on text-based information. Recently, multimodal variants of LLMs acquired the capability of recognizing images. <strong>Objective:</strong> We aim to evaluate the image recognition capability of generative pretrained transformer (GPT)-4V, a recent multimodal LLM developed by OpenAI, in the medical field by testing how visual information affects its performance to answer questions in the 117th Japanese National Medical Licensing Examination. <strong>Methods:</strong> We focused on 108 questions that had 1 or more images as part of a question and presented GPT-4V with the same questions under two conditions: (1) with both the question text and associated images and (2) with the question text only. We then compared the difference in accuracy between the 2 conditions using the exact McNemar test. <strong>Results:</strong> Among the 108 questions with images, GPT-4V’s accuracy was 68% (73/108) when presented with images and 72% (78/108) when presented without images (<i>P</i>=.36). For the 2 question categories, clinical and general, the accuracies with and those without images were 71% (70/98) versus 78% (76/98; <i>P</i>=.21) and 30% (3/10) versus 20% (2/10; <i>P</i>≥.99), respectively. <strong>Conclusions:</strong> The additional information from the images did not significantly improve the performance of GPT-4V in the Japanese National Medical Licensing Examination. <strong>Trial Registration:</strong> 2024-03-12T09:45:04-04:00 https://mededu.jmir.org/2024/1/e48393/ Sharing Digital Health Educational Resources in a One-Stop Shop Portal: Tutorial on the Catalog and Index of Digital Health Teaching Resources (CIDHR) Semantic Search Engine2024-03-04T10:00:04-05:00Julien GrosjeanArriel BenisJean-Charles DufourÉmeline LejeuneFlavien DissonBadisse DahamnaHélène CieslikRomain LéguillonMatthieu FaureFrank DufourPascal StacciniStéfan Jacques Darmoni<strong>Background:</strong> Access to reliable and accurate digital health web-based resources is crucial. However, the lack of dedicated search engines for non-English languages, such as French, is a significant obstacle in this field. Thus, we developed and implemented a multilingual, multiterminology semantic search engine called <i>Catalog and Index of Digital Health Teaching Resources</i> (CIDHR). CIDHR is freely accessible to everyone, with a focus on French-speaking resources. CIDHR has been initiated to provide validated, high-quality content tailored to the specific needs of each user profile, be it students or professionals. <strong>Objective:</strong> This study’s primary aim in developing and implementing the CIDHR is to improve knowledge sharing and spreading in digital health and health informatics and expand the health-related educational community, primarily French speaking but also in other languages. We intend to support the continuous development of initial (ie, bachelor level), advanced (ie, master and doctoral levels), and continuing training (ie, professionals and postgraduate levels) in digital health for health and social work fields. The main objective is to describe the development and implementation of CIDHR. The hypothesis guiding this research is that controlled vocabularies dedicated to medical informatics and digital health, such as the Medical Informatics Multilingual Ontology (MIMO) and the concepts structuring the French National Referential on Digital Health (FNRDH), to index digital health teaching and learning resources, are effectively increasing the availability and accessibility of these resources to medical students and other health care professionals. <strong>Methods:</strong> First, resource identification is processed by medical librarians from websites and scientific sources preselected and validated by domain experts and surveyed every week. Then, based on MIMO and FNRDH, the educational resources are indexed for each related knowledge domain. The same resources are also tagged with relevant academic and professional experience levels. Afterward, the indexed resources are shared with the digital health teaching and learning community. The last step consists of assessing CIDHR by obtaining informal feedback from users. <strong>Results:</strong> Resource identification and evaluation processes were executed by a dedicated team of medical librarians, aiming to collect and curate an extensive collection of digital health teaching and learning resources. The resources that successfully passed the evaluation process were promptly included in CIDHR. These resources were diligently indexed (with MIMO and FNRDH) and tagged for the study field and degree level. By October 2023, a total of 371 indexed resources were available on a dedicated portal. <strong>Conclusions:</strong> CIDHR is a multilingual digital health education semantic search engine and platform that aims to increase the accessibility of educational resources to the broader health care–related community. It focuses on making resources “findable,” “accessible,” “interoperable,” and “reusable” by using a one-stop shop portal approach. CIDHR has and will have an essential role in increasing digital health literacy. 2024-03-04T10:00:04-05:00 https://mededu.jmir.org/2024/1/e54401/ Development of a Clinical Simulation Video to Evaluate Multiple Domains of Clinical Competence: Cross-Sectional Study2024-02-29T09:45:24-05:00Kiyoshi ShikinoYuji NishizakiSho FukuiDaiki YokokawaYu YamamotoHiroyuki KobayashiTaro ShimizuYasuharu Tokuda<strong>Background:</strong> Medical students in Japan undergo a 2-year postgraduate residency program to acquire clinical knowledge and general medical skills. The General Medicine In-Training Examination (GM-ITE) assesses postgraduate residents’ clinical knowledge. A clinical simulation video (CSV) may assess learners’ interpersonal abilities. <strong>Objective:</strong> This study aimed to evaluate the relationship between GM-ITE scores and resident physicians’ diagnostic skills by having them watch a CSV and to explore resident physicians’ perceptions of the CSV’s realism, educational value, and impact on their motivation to learn. <strong>Methods:</strong> The participants included 56 postgraduate medical residents who took the GM-ITE between January 21 and January 28, 2021; watched the CSV; and then provided a diagnosis. The CSV and GM-ITE scores were compared, and the validity of the simulations was examined using discrimination indices, wherein ≥0.20 indicated high discriminatory power and >0.40 indicated a very good measure of the subject’s qualifications. Additionally, we administered an anonymous questionnaire to ascertain participants’ views on the realism and educational value of the CSV and its impact on their motivation to learn. <strong>Results:</strong> Of the 56 participants, 6 (11%) provided the correct diagnosis, and all were from the second postgraduate year. All domains indicated high discriminatory power. The (anonymous) follow-up responses indicated that the CSV format was more suitable than the conventional GM-ITE for assessing clinical competence. The anonymous survey revealed that 12 (52%) participants found the CSV format more suitable than the GM-ITE for assessing clinical competence, 18 (78%) affirmed the realism of the video simulation, and 17 (74%) indicated that the experience increased their motivation to learn. <strong>Conclusions:</strong> The findings indicated that CSV modules simulating real-world clinical examinations were successful in assessing examinees’ clinical competence across multiple domains. The study demonstrated that the CSV not only augmented the assessment of diagnostic skills but also positively impacted learners’ motivation, suggesting a multifaceted role for simulation in medical education. 2024-02-29T09:45:24-05:00 https://mededu.jmir.org/2024/1/e51426/ Exploring the Feasibility of Using ChatGPT to Create Just-in-Time Adaptive Physical Activity mHealth Intervention Content: Case Study2024-02-29T09:30:28-05:00Amanda WillmsSam Liu<strong>Background:</strong> Achieving physical activity (PA) guidelines’ recommendation of 150 minutes of moderate-to-vigorous PA per week has been shown to reduce the risk of many chronic conditions. Despite the overwhelming evidence in this field, PA levels remain low globally. By creating engaging mobile health (mHealth) interventions through strategies such as just-in-time adaptive interventions (JITAIs) that are tailored to an individual’s dynamic state, there is potential to increase PA levels. However, generating personalized content can take a long time due to various versions of content required for the personalization algorithms. ChatGPT presents an incredible opportunity to rapidly produce tailored content; however, there is a lack of studies exploring its feasibility. <strong>Objective:</strong> This study aimed to (1) explore the feasibility of using ChatGPT to create content for a PA JITAI mobile app and (2) describe lessons learned and future recommendations for using ChatGPT in the development of mHealth JITAI content. <strong>Methods:</strong> During phase 1, we used Pathverse, a no-code app builder, and ChatGPT to develop a JITAI app to help parents support their child’s PA levels. The intervention was developed based on the Multi-Process Action Control (M-PAC) framework, and the necessary behavior change techniques targeting the M-PAC constructs were implemented in the app design to help parents support their child’s PA. The acceptability of using ChatGPT for this purpose was discussed to determine its feasibility. In phase 2, we summarized the lessons we learned during the JITAI content development process using ChatGPT and generated recommendations to inform future similar use cases. <strong>Results:</strong> In phase 1, by using specific prompts, we efficiently generated content for 13 lessons relating to increasing parental support for their child’s PA following the M-PAC framework. It was determined that using ChatGPT for this case study to develop PA content for a JITAI was acceptable. In phase 2, we summarized our recommendations into the following six steps when using ChatGPT to create content for mHealth behavior interventions: (1) determine target behavior, (2) ground the intervention in behavior change theory, (3) design the intervention structure, (4) input intervention structure and behavior change constructs into ChatGPT, (5) revise the ChatGPT response, and (6) customize the response to be used in the intervention. <strong>Conclusions:</strong> ChatGPT offers a remarkable opportunity for rapid content creation in the context of an mHealth JITAI. Although our case study demonstrated that ChatGPT was acceptable, it is essential to approach its use, along with other language models, with caution. Before delivering content to population groups, expert review is crucial to ensure accuracy and relevancy. Future research and application of these guidelines are imperative as we deepen our understanding of ChatGPT and its interactions with human input. 2024-02-29T09:30:28-05:00 https://mededu.jmir.org/2024/1/e57594/ Correction: How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment2024-02-27T13:00:04-05:00Aidan GilsonConrad W SafranekThomas HuangVimig SocratesLing ChiRichard Andrew TaylorDavid Chartash2024-02-27T13:00:04-05:00 https://mededu.jmir.org/2024/1/e50156/ Measuring e-Professional Behavior of Doctors of Medicine and Dental Medicine on Social Networking Sites: Indexes Construction With Formative Indicators2024-02-27T09:45:21-05:00Marko MarelićKsenija KlasnićTea Vukušić Rukavina<strong>Background:</strong> Previous studies have predominantly measured e-professionalism through perceptions or attitudes, yet there exists no validated measure specifically targeting the actual behaviors of health care professionals (HCPs) in this realm. This study addresses this gap by constructing a normative framework, drawing from 3 primary sources to define e-professional behavior across 6 domains. Four domains pertain to the dangers of social networking sites (SNSs), encompassing confidentiality, privacy, patient interaction, and equitable resource allocation. Meanwhile, 2 domains focus on the opportunities of SNSs, namely, the proactive dissemination of public health information and maintaining scientific integrity. <strong>Objective:</strong> This study aims to develop and validate 2 new measures assessing the e-professional behavior of doctors of medicine (MDs) and doctors of dental medicine (DMDs), focusing on both the dangers and opportunities associated with SNSs. <strong>Methods:</strong> The study used a purposive sample of MDs and DMDs in Croatia who were users of at least one SNS. Data collection took place in 2021 through an online survey. Validation of both indexes used a formative approach, which involved a 5-step methodology: content specification, indicators definition with instructions for item coding and index construction, indicators collinearity check using the variance inflation factor (VIF), external validity test using multiple indicators multiple causes (MIMIC) model, and external validity test by checking the relationships of the indexes with the scale of attitude toward SNSs using Pearson correlation coefficients. <strong>Results:</strong> A total of 753 responses were included in the analysis. The first e-professionalism index, assessing the dangers associated with SNSs, comprises 14 items. During the indicators collinearity check, all indicators displayed acceptable VIF values below 2.5. The MIMIC model showed good fit (χ<sup>2</sup><sub>13</sub>=9.4, <i>P</i>=.742; χ<sup>2</sup>/df=0.723; root-mean-square error of approximation<.001; goodness-of-fit index=0.998; comparative fit index=1.000). The external validity of the index is supported by a statistically significant negative correlation with the scale measuring attitudes toward SNSs (r=–0.225, <i>P</i><.001). Following the removal of 1 item, the second e-professionalism index, focusing on the opportunities associated with SNSs, comprises 5 items. During the indicators collinearity check, all indicators exhibited acceptable VIF values below 2.5. Additionally, the MIMIC model demonstrated a good fit (χ<sup>2</sup><sub>4</sub>=2.5, <i>P</i>=.718; χ<sup>2</sup>/df=0.637; root-mean-square error of approximation<0.001; goodness-of-fit index=0.999; comparative fit index=1.000). The external validity of the index is supported by a statistically significant positive correlation with the scale of attitude toward SNSs (r=0.338; <i>P</i><.001). <strong>Conclusions:</strong> Following the validation process, the instrument designed for gauging the e-professional behavior of MDs and DMDs consists of 19 items, which contribute to the formation of 2 distinct indexes: the e-professionalism index, focusing on the dangers associated with SNSs, comprising 14 items, and the e-professionalism index, highlighting the opportunities offered by SNSs, consisting of 5 items. These indexes serve as valid measures of the e-professional behavior of MDs and DMDs, with the potential for further refinement to encompass emerging forms of unprofessional behavior that may arise over time. 2024-02-27T09:45:21-05:00 https://mededu.jmir.org/2024/1/e48989/ Using ChatGPT-Like Solutions to Bridge the Communication Gap Between Patients With Rheumatoid Arthritis and Health Care Professionals2024-02-27T09:45:04-05:00Chih-Wei ChenPaul WalterJames Cheng-Chung WeiThe communication gap between patients and health care professionals has led to increased disputes and resource waste in the medical domain. The development of artificial intelligence and other technologies brings new possibilities to solve this problem. This viewpoint paper proposes a new relationship between patients and health care professionals—“shared decision-making”—allowing both sides to obtain a deeper understanding of the disease and reach a consensus during diagnosis and treatment. Then, this paper discusses the important impact of ChatGPT-like solutions in treating rheumatoid arthritis using methotrexate from clinical and patient perspectives. For clinical professionals, ChatGPT-like solutions could provide support in disease diagnosis, treatment, and clinical trials, but attention should be paid to privacy, confidentiality, and regulatory norms. For patients, ChatGPT-like solutions allow easy access to massive amounts of information; however, the information should be carefully managed to ensure safe and effective care. To ensure the effective application of ChatGPT-like solutions in improving the relationship between patients and health care professionals, it is essential to establish a comprehensive database and provide legal, ethical, and other support. Above all, ChatGPT-like solutions could benefit patients and health care professionals if they ensure evidence-based solutions and data protection and collaborate with regulatory authorities and regulatory evolution.2024-02-27T09:45:04-05:00 https://mededu.jmir.org/2024/1/e52155/ Using AI Text-to-Image Generation to Create Novel Illustrations for Medical Education: Current Limitations as Illustrated by Hypothyroidism and Horner Syndrome2024-02-22T10:00:28-05:00Ajay KumarPierce BurrTim Michael YoungOur research letter investigates the potential, as well as the current limitations, of widely available text-to-image tools in generating images for medical education. We focused on illustrations of important physical signs in the face (for which confidentiality issues in conventional patient photograph use may be a particular concern) that medics should know about, and we used facial images of hypothyroidism and Horner syndrome as examples.2024-02-22T10:00:28-05:00 https://mededu.jmir.org/2024/1/e51523/ Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard2024-02-21T10:00:41-05:00Faiza FarhatBeenish Moalla ChaudhryMohammad NadeemShahab Saquib SohailDag Øivind MadsenBackground: Large language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive training on large data sets. These models, including Generative Pre-trained Transformers (GPT)-3.5 (OpenAI), GPT-4 (OpenAI), and Bard (Google LLC), find applications beyond natural language processing, attracting interest from academia and industry. Students are actively leveraging LLMs to enhance learning experiences and prepare for high-stakes exams, such as the National Eligibility cum Entrance Test (NEET) in India. Objective: This comparative analysis aims to evaluate the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. Methods: In this paper, we evaluated the performance of the 3 mainstream LLMs, namely GPT-3.5, GPT-4, and Google Bard, in answering questions related to the NEET-2023 exam. The questions of the NEET were provided to these artificial intelligence models, and the responses were recorded and compared against the correct answers from the official answer key. Consensus was used to evaluate the performance of all 3 models. Results: It was evident that GPT-4 passed the entrance test with flying colors (300/700, 42.9%), showcasing exceptional performance. On the other hand, GPT-3.5 managed to meet the qualifying criteria, but with a substantially lower score (145/700, 20.7%). However, Bard (115/700, 16.4%) failed to meet the qualifying criteria and did not pass the test. GPT-4 demonstrated consistent superiority over Bard and GPT-3.5 in all 3 subjects. Specifically, GPT-4 achieved accuracy rates of 73% (29/40) in physics, 44% (16/36) in chemistry, and 51% (50/99) in biology. Conversely, GPT-3.5 attained an accuracy rate of 45% (18/40) in physics, 33% (13/26) in chemistry, and 34% (34/99) in biology. The accuracy consensus metric showed that the matching responses between GPT-4 and Bard, as well as GPT-4 and GPT-3.5, had higher incidences of being correct, at 0.56 and 0.57, respectively, compared to the matching responses between Bard and GPT-3.5, which stood at 0.42. When all 3 models were considered together, their matching responses reached the highest accuracy consensus of 0.59. Conclusions: The study’s findings provide valuable insights into the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. GPT-4 emerged as the most accurate model, highlighting its potential for educational applications. Cross-checking responses across models may result in confusion as the compared models (as duos or a trio) tend to agree on only a little over half of the correct responses. Using GPT-4 as one of the compared models will result in higher accuracy consensus. The results underscore the suitability of LLMs for high-stakes exams and their positive impact on education. Additionally, the study establishes a benchmark for evaluating and enhancing LLMs’ performance in educational tasks, promoting responsible and informed use of these models in diverse learning environments. 2024-02-21T10:00:41-05:00 https://mededu.jmir.org/2024/1/e48507/ Occupational Therapy Students’ Evidence-Based Practice Skills as Reported in a Mobile App: Cross-Sectional Study2024-02-21T10:00:25-05:00Susanne G JohnsonBirgitte EspehaugLillebeth LarunDonna CiliskaNina Rydland Olsen<strong>Background:</strong> Evidence-based practice (EBP) is an important aspect of the health care education curriculum. EBP involves following the 5 EBP steps: ask, assess, appraise, apply, and audit. These 5 steps reflect the suggested core competencies covered in teaching and learning programs to support future health care professionals applying EBP. When implementing EBP teaching, assessing outcomes by documenting the student’s performance and skills is relevant. This can be done using mobile devices. <strong>Objective:</strong> The aim of this study was to assess occupational therapy students’ EBP skills as reported in a mobile app. <strong>Methods:</strong> We applied a cross-sectional design. Descriptive statistics were used to present frequencies, percentages, means, and ranges of data regarding EBP skills found in the EBPsteps app. Associations between students’ ability to formulate the Population, Intervention, Comparison, and Outcome/Population, Interest, and Context (PICO/PICo) elements and identifying relevant research evidence were analyzed with the chi-square test. <strong>Results:</strong> Of 4 cohorts with 150 students, 119 (79.3%) students used the app and produced 240 critically appraised topics (CATs) in the app. The EBP steps “ask,” “assess,” and “appraise” were often correctly performed. The clinical question was formulated correctly in 53.3% (128/240) of the CATs, and students identified research evidence in 81.2% (195/240) of the CATs. Critical appraisal checklists were used in 81.2% (195/240) of the CATs, and most of these checklists were assessed as relevant for the type of research evidence identified (165/195, 84.6%). The least frequently correctly reported steps were “apply” and “audit.” In 39.6% (95/240) of the CATs, it was reported that research evidence was applied. Only 61% (58/95) of these CATs described how the research was applied to clinical practice. Evaluation of practice changes was reported in 38.8% (93/240) of the CATs. However, details about practice changes were lacking in all these CATs. A positive association was found between correctly reporting the "population" and "interventions/interest" elements of the PICO/PICo and identifying research evidence (<i>P</i><.001). <strong>Conclusions:</strong> We assessed the students’ EBP skills based on how they documented following the EBP steps in the EBPsteps app, and our results showed variations in how well the students mastered the steps. “Apply” and “audit” were the most difficult EBP steps for the students to perform, and this finding has implications and gives directions for further development of the app and educational instruction in EBP. The EBPsteps app is a new and relevant app for students to learn and practice EBP, and it can be used to assess students’ EBP skills objectively. 2024-02-21T10:00:25-05:00