Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Technology, innovation, and openness in medical education in the information age.

Latest Submissions Open for Peer Review

JMIR has been a leader in applying openness, participation, collaboration and other "2.0" ideas to scholarly publishing, and since December 2009 offers open peer review articles, allowing JMIR users to sign themselves up as peer reviewers for specific articles currently considered by the Journal (in addition to author- and editor-selected reviewers). Note that this is a not a complete list of submissions as authors can opt-out. The list below shows recently submitted articles where submitting authors have not opted-out of open peer-review and where the editor has not made a decision yet. (Note that this feature is for reviewing specific articles - if you just want to sign up as reviewer (and wait for the editor to contact you if articles match your interests), please sign up as reviewer using your profile).
To assign yourself to an article as reviewer, you must have a user account on this site (if you don't have one, register for a free account here) and be logged in (please verify that your email address in your profile is correct). Add yourself as a peer reviewer to any article by clicking the '+Peer-review Me!+' link under each article. Full instructions on how to complete your review will be sent to you via email shortly after. Do not sign up as peer-reviewer if you have any conflicts of interest (note that we will treat any attempts by authors to sign up as reviewer under a false identity as scientific misconduct and reserve the right to promptly reject the article and inform the host institution).
The standard turnaround time for reviews is currently 2 weeks, and the general aim is to give constructive feedback to the authors and/or to prevent publication of uninteresting or fatally flawed articles. Reviewers will be acknowledged by name if the article is published, but remain anonymous if the article is declined.

The abstracts on this page are unpublished studies - please do not cite them (yet). If you wish to cite them/wish to see them published, write your opinion in the form of a peer-review!

Tip: Include the RSS feed of the JMIR submissions on this page on your iGoogle homepage, blog, or desktop RSS reader to stay informed about current submissions!

JMIR Submissions under Open Peer Review

↑ Grab this Headline Animator


If you follow us on Twitter, we will also announce new submissions under open peer-review there.


Titles/Abstracts of Articles Currently Open for Review


Titles/Abstracts of Articles Currently Open for Review:

  • Background: The rapid integration of AI in scholarly writing presents transformative potential for medical research, yet comprehensive data on healthcare professionals’ engagement with these tools remain scarce. Objective: This study aimed to assess awareness, attitudes, utilization, and training needs regarding artificial intelligence (AI) in medical scholarly writing among healthcare professionals, alongside evaluating current applications and future developmental potential. Methods: An online questionnaire was distributed to healthcare professionals across various medical institutions, yielding 782 responses. The questionnaire explored familiarity with AI writing technologies, attitudes and preferences toward their use, actual usage behaviors, and specific training needs, along with factors influencing these aspects. Results: The results showed that although 71.74% of respondents reported awareness of AI writing technologies, only 23.53% indicated a thorough understanding. Healthcare professionals predominantly held neutral or supportive attitudes toward AI writing (63.94% and 51.53%, respectively), with over 60% expressed willingness to adopt it, primarily motivated by expected improvements in writing efficiency (87.53%) and enhanced article structuring (63.94%). However, actual usage remained limited, with only 6.14% having utilized AI writing tools. The major barriers identified included insufficient skills and knowledge (64.64%), high costs (69.82%), and concerns about the quality of content generated (80.56%). Additionally, only 2.43% of respondents had received training related to AI writing tools, yet 61.77% indicated substantial interest in such training, emphasizing areas such as basic operations, adherence to writing standards, and quality control measures. Conclusions: AI writing technologies exhibit significant potential within the medical academic community, and healthcare professionals generally regard their application positively. Nonetheless, limited awareness, low practical utilization rates, and perceived technological limitations currently restrict broader adoption. Moving forward, strategies such as enhancing tool functionalities, providing extensive practical examples, offering free trial periods, and establishing systematic training programs should be considered critical strategies. These initiatives are expected to enhance proficiency in AI-assisted writing among healthcare professionals, thereby fostering innovation and improving efficiency within medical research. Clinical Trial: Approved by Sun Yat-sen Memorial Hospital Ethics Committee (SYSKY-2024-074-01). Electronic informed consent was obtained from all participants.

  • Enhancing Exposure Therapy Training through Virtual Reality Simulation: A Randomized Pilot Trial

    Date Submitted: Jul 3, 2025
    Open Peer Review Period: Jul 7, 2025 - Sep 1, 2025

    Background: Despite robust empirical support, exposure-based CBT remains one of the least utilized evidence-based practices (EBPs) for anxiety disorders in typical practice settings. Research suggests providers’ negative beliefs about the risk of negative events during exposure delivery are a major predictor of its underutilization. Studies have demonstrated that incorporating experiential learning such as role-playing into conventional didactic training can reduce therapists’ negative beliefs. However, these methods face limitations in terms of accessibility, standardization, and fidelity to real-life experiences. Emerging evidence suggests virtual reality (VR) simulations may be an effective and scalable alternative for improving skills and attitudes pertinent to mental health treatment. Objective: This study examines the initial efficacy of a novel VR simulation-based exposure training program (SET-VRTM) based on (1) perceptions of usability, and (2) degree of change in therapist learning targets (i.e. knowledge, self-efficacy, attitudes). Clinician participants were randomly assigned to a low-immersion desktop version or a high-immersion head-mounted display (HMD) version of the SET-VRTM program to explore the influence of immersion on key outcomes. Methods: Clinician participants (N=41) were recruited from a variety of practice settings. Before randomization, both groups received conventional (4-hour) didactic training for exposure therapy. Next, groups were assigned to immersion modality (desktop or HMD) and began delivering exposures to a virtually simulated patient. Participants practiced titrating exposure intensity (increase, decrease, continue) based on real-time visual and auditory cues from the virtual patient. Participants completed three rounds of exposure delivery to a simulated patient and reviewed their decisions with feedback at the end of each round. Exposure knowledge, exposure self-efficacy, and beliefs about exposures were measured at baseline, post-didactic, and post-VR. Participants also rated the acceptability, usability, and real-world authenticity of VR exposure training. Results: Both groups (desktop, HMD) showed significant improvement in exposure knowledge (p<.01; p<.01), self-efficacy (p<.01; p<.01), and beliefs about exposure (p<.01; p<.01) between baseline and didactic training. There were no significant differences between the low and high immersion groups on any measure at baseline or after didactics. Both groups demonstrated significant improvement in exposure self-efficacy (p<.01; p<.01) and beliefs (P<.001; p=.012) from post-didactic to post-exposure delivery. Neither showed improved knowledge from post-didactic to post-exposure delivery (p>.05; p>.05). Both groups gave highly positive ratings for the acceptability, usability, and authenticity of the simulated training experience. Taken together, results indicate that VR training significantly improved therapists’ self-efficacy and beliefs about exposures beyond gains from didactic training alone.  Conclusions: VR exposure therapy training is both well-received and effective in addressing clinician-level barriers to optimal exposure delivery. Supplementing conventional didactic training with experiential learning via VR sessions may be a promising next step in optimizing the standardization, scalability, and effectiveness of exposure training. Clinical Trial: Clinicaltrials.gov Identifier: NCT06706245

  • Background: Game-based learning has emerged as an effective strategy for enhancing knowledge and engagement in healthcare education. However, they have not been specifically designed to support cognitive improvements for diverse learning styles in oral microbiology and immunology. Objective: This study aimed to develop and evaluate an educational card game designed to support diverse learning styles in oral microbiology and immunology, using a duel-style format. Methods: A mixed-methods study was conducted with 40 third-year dental students, where half of them were assigned to the first group starting as the host, while those in the other groups began as the microbe. Participants alternated between the microbe and host roles during gameplay. Active engagement through playing as the microbe facilitated knowledge acquisition and recall. On the other hand, the host role aimed to promoted decision-making and the application of knowledge. Quantitative data were collected using pre- and post-knowledge assessments and satisfaction questionnaires. Qualitative insights were obtained through semi-structured interviews exploring learning experiences when playing as the microbe compared to the host. Results: Students demonstrated significant improvements in knowledge scores across the three assessments (P<.01), with no difference between groups (P>.05). They also perceived the game positively in all three aspects (usefulness, ease of use, and enjoyment). Qualitative findings revealed that role variation supported both inductive and deductive learning processes. Participants valued the combination of pedagogical and entertaining components, leading to the game motivation and engagement. A conceptual framework demonstrated key emerging themes relevant to the game design and implementation, including learner profile, learning setting, game design, learning process, and learning outcomes. Conclusions: The card game effectively enhanced knowledge acquisition, strategic thinking, and student engagement in oral microbiology and immunology. Role-switching between the host and microbe facilitated multiple learning pathways, meeting diverse learner styles. Integrating such educational card games in dental education may bridge theoretical understanding and clinical reasoning. Further research is recommended to investigate long-term retention and broader practicality.

  • This study aims to explore how artificial intelligence-generated content (AIGC) impacts the critical thinking skills of medical students through a systematic review. It also aims to develop a framework for coping strategies. The study focuses on AIGC's use in clinical diagnosis, evidence-based medicine, ethical decision-making, and scientific research, while examining challenges and ways to enhance critical thinking. The study followed the PRISMA 2020 guidelines, searching English literature from November 2022 to June 2025 in PubMed using keywords like "AIGC," "medical students," and "critical thinking." Two reviewers evaluated and analyzed relevant studies qualitatively. Additionally, the research predominantly emphasizes short-term effects and lacks follow-up evaluations regarding the long-term impacts of AIGC. AIGC in medical education has both benefits and drawbacks. It provides rich learning resources and tools, speeding up knowledge acquisition. However, overreliance on AIGC may reduce critical thinking skills. Strategies like tailored AI tools, virtual patients, and evaluating AI limitations can help maintain and improve critical thinking. no

  • Background: Induction training for junior doctors in otolaryngology (ENT) must address a wide range of prior experience. Artificial intelligence (AI) avatars offer a novel approach to deliver educational content. This study evaluated whether an AI avatar-delivered ENT induction course could improve trainee confidence in key ENT clinical skills. Objective: To evaluate the feasibility, acceptability, and educational impact of an AI avatar-delivered induction course on junior doctors’ self-reported confidence in key ENT clinical skills. Methods: A modular online ENT induction course was developed using AI-generated avatar instructors (video-based, non-interactive) via the HeyGen platform. The course content covered otoscopic examination, endoscopic anatomy and pathology of the upper aerodigestive tract, management of ENT emergencies, triaging referrals, and acute airway management. Thirty junior doctors (Foundation Year 2, general practice trainees, core surgical trainees, and clinical fellows) at a tertiary hospital ENT department completed the course. Participants rated their confidence in seven ENT skills before and after the course on a 10-point Likert scale (1 = not confident, 10 = extremely confident). A post-course survey collected feedback on the AI tutors’ understandability, willingness to use AI-based learning in the future, and comparisons of the learning experience and content retention versus traditional methods. Paired t-tests were used to analyze changes in confidence. No objective skill assessment was performed. Results: All 30 participants completed both pre- and post-course assessments. Mean self-confidence scores improved significantly in all seven ENT skill domains after the course (mean increases ranging from +2.5 to +4.3 points on the 10-point scale; p<0.001 for each). The largest gains were in identifying normal endoscopic anatomy and in triaging ENT referrals. The AI avatar tutors were generally well understood (mean clarity rating 7.8/10). A majority of trainees (57%, 17/30) expressed willingness to take further AI-delivered courses, with 30% unsure and 13% unwilling. However, most participants (66.7%) reported no difference in their overall learning experience compared to traditional instructor-led videos, and 20% felt the AI format was inferior to traditional methods (only 13.3% reported an enhanced learning experience). Similarly, 70% perceived no impact of the AI tutors on their ability to retain material (13.3% reported enhanced retention, 16.7% reported worse retention). Conclusions: An AI avatar-delivered induction course substantially increased junior doctors’ self-reported confidence across a range of essential ENT skills. The intervention was generally well received and accepted by trainees. Nevertheless, despite objective confidence gains, most participants did not perceive the AI avatars to improve their learning experience relative to conventional teaching, highlighting an important gap between confidence and perceived educational value. AI avatar tutors show promise as scalable tools in surgical education to supplement training, but further refinement—such as increasing interactivity—and evaluation (including objective performance measures) are warranted to optimize their effectiveness. Clinical Trial: n/a

  • PsicoSimGPT: Evaluating the Use of Generative AI for Training in Psychopathological Interviewing

    Date Submitted: Jun 10, 2025
    Open Peer Review Period: Jul 1, 2025 - Aug 26, 2025

    Background: Clinical reasoning is crucial in psychology education, yet traditional training methods provide limited practical experience. Virtual patients (VPs), enhanced by generative artificial intelligence (GAI), may effectively bridge this gap, offering realistic simulations that promote diagnostic and reasoning skills in a controlled environment Objective: To evaluate the impact of GAI-powered conversational virtual patients on active learning, student satisfaction, participation levels, and overall educational experience in an undergraduate psychopathology course. Methods: The study involved 160 second-year psychology undergraduates at Miguel Hernández University, who engaged in structured text-based interviews with virtual patients generated by ChatGPT (gpt-4o model). Each student participated in one to six sessions, resulting in 1,832 recorded interactions. AI temperature settings (0.1, 0.5, 0.9) were systematically varied to examine their effect on interactions and perceptions. Sentiment analysis was conducted using Python's "pysentimiento" library, and quantitative data were analyzed with R software. Results: Participants rated the platform highly, with median ratings close to 10 across different conditions. Statistical analysis revealed no significant correlation between age (p = 0.42) or number of questions asked (p = 0.42) and user ratings. A moderate negative correlation was found between AI errors and ratings (r = –0.31, p < 0.001). Temperature settings significantly influenced ratings (Kruskal-Wallis test, p = 0.031), with higher ratings at the 0.9 temperature compared to 0.1 (Dunn's test, p = 0.037). Sentiment analysis showed predominantly negative sentiment in AI responses (median negativity = 0.8903), reflecting clinical realism. Conclusions: GAI-powered conversational VPs significantly enhance clinical training in psychopatology skills, providing realistic, engaging simulations that improve student satisfaction and clinical reasoning skills. Optimizing AI temperature settings can further enhance educational effectiveness, highlighting the value of carefully tailored simulation parameters.

  • Enhancing Clinical Competencies through Peer Role-Play: A Study on Oncology Graduate Student Training

    Date Submitted: Jun 27, 2025
    Open Peer Review Period: Jun 27, 2025 - Aug 22, 2025

    Background: Clinical competency is essential for oncology students to deliver high-quality patient care. However, traditional teaching methods may not fully support the development of critical skills such as communication, empathy, and clinical judgment. Peer role-play has emerged as a promising approach to bridge these gaps by enhancing interpersonal and diagnostic competencies within clinical settings. Objective: This study aims to evaluate the effectiveness of peer role-play in developing clinical competencies among oncology graduate students during their clinical rotation. Methods: This study involved 70 first-year oncology graduate students from Guangzhou Medical University Cancer Hospital in a three-month clinical rotation within the Department of Oncology from January 2022 to December 2023. Participants were randomly assigned to either a peer role-play group (n = 35) or a traditional teaching group (n = 35), ensuring balanced gender and baseline competencies. The role-play group engaged in a structured curriculum that included case presentation, classroom instruction, and weekly role-play sessions, with debriefing and feedback sessions following each role-play. The traditional group adhered to a standard curriculum without role-play exercises. Assessments included a baseline Oncology Theory Exam, Mini-Clinical Evaluation Exercise (Mini-CEX) for clinical competency evaluation, and a satisfaction survey for the role-play group. Results: Baseline theory exam scores were comparable between the two groups (p > 0.05). However, the peer role-play group demonstrated significant improvements in doctor-patient communication, medical history taking, clinical judgment, and overall clinical competence compared to the traditional teaching group (p < 0.05). Furthermore, students in the role-play group reported high levels of satisfaction, citing scenario realism, communication practice opportunities, and feedback quality as key benefits. Conclusions: The study indicates that peer role-play is an effective educational approach for developing clinical competencies in oncology graduate students, particularly in communication, empathy, and clinical reasoning. Role-play provides an engaging and practical learning experience, making it a valuable addition to clinical training programs aimed at enhancing patient-centered care skills in students.

  • Background: Although large language models (LLMs) have demonstrated promising diagnostic performance, it is uncertain whether their use improves diagnostic reasoning of medical students. Objective: To investigate the impact of an LLM on medical students’ diagnostic performance in rheumatology compared with traditional resources. Methods: This randomized controlled trial was conducted from January 7 to March 30, 2025, and recruited medical students from University Marburg, Germany. Participants provided a main diagnosis with corresponding diagnostic confidence and up to four additional differential diagnoses for three rheumatic vignettes. Participants were randomized to either use the LLM in addition to traditional diagnostic resources or traditional resources only. The primary outcome was the proportion of cases with a correct top diagnosis. Secondary outcomes included the proportion of cases with a correct diagnosis among top 5 suggestions, a cumulative diagnostic score, diagnostic confidence and case completion time. Diagnostic suggestions were rated by blinded expert consensus. Results: A total of 68 participants (mean [SD] age, 24.8 [2.6]) were randomized. Participants using the LLM identified the correct top diagnosis significantly more often than those in the control group (77.5% vs 32.4%), corresponding to an adjusted odds ratio of 7.0 (95% CI: [3.8, 14.4], P<.001) and also outperformed the LLM alone (77.5% vs 71.6%). Mean cumulative diagnostic scores were significantly higher in the LLM group (mean [SD], 12.3 [12.3]) compared with the control group (6.7 [3.2]; Welch t₆₀.₂₂ = 8.1; P<.001). Diagnostic confidence was greater in the LLM group (mean 7.0 [SD 1.3]) than in the control group (mean 6.1 [SD 1.2]; P<.001). Case completion time was significantly longer in the LLM group (mean 505 seconds [SD 131]) compared to the control group (mean 287 seconds [SD 106]; P<.001). Conclusions: In this randomized clinical trial, medical students using an LLM achieved significantly higher diagnostic accuracy than those using conventional resources. Students assisted by the LLM also outperformed the model alone, highlighting the potential of human-AI collaboration. These findings suggest that LLMs may help improve clinical reasoning in complex fields such as rheumatology. Clinical Trial: ClinicalTrials.gov Identifier: NCT06748170

  • Comparison of ChatGPT and DeepSeek on a Standardized Audiologist Qualification Examination in Chinese

    Date Submitted: Jun 23, 2025
    Open Peer Review Period: Jun 23, 2025 - Aug 18, 2025

    Background: Generative AI (GenAI), exemplified by ChatGPT and DeepSeek, is rapidly advancing and reshaping human-computer interaction with its growing reasoning capabilities and broad applications across fields like medicine and education. Objective: This study aimed to evaluate the performance of two generative artificial intelligence (GenAI) models (ChatGPT-4-turbo, and DeepSeek-R1) on a Standardized Audiologist Qualification Examination in Chinese, and to explore their potential applicability in audiology education and clinical training. Methods: The 2024 Taiwan Audiologist Qualification Examination (TAQE), comprising 300 multiple-choice questions across six subjects [i.e., (1) Basic Hearing Science, (2) Behavioral Audiology, (3) Electrophysiological Audiology, (4) Principles and Practice of Hearing Devices, (5) Health and Rehabilitation of the Auditory and Balance Systems, and (6) Hearing and Speech Communication Disorders (including Professional Ethics)], was used to assess the performance of the two GenAI models. The complete answering process and reasoning paths of the models were recorded, and performance was analyzed by overall accuracy, subject-specific scores, and question-type scores. Statistical comparisons were performed using the Wilcoxon signed-rank test. Results: ChatGPT and DeepSeek achieved overall accuracies of 80% and 79%, respectively, which are higher than the passing criterium of the TAQE (i.e., 60% correct). The accuracies for the six subject areas were 88%, 70%, 86%, 76%, 82%, and 80% for ChatGPT and 82%, 72%, 78%, 80%, 80%, and 84% for DeepSeek. No significant differences were found in the overall accuracies or performance on all subject areas between the two models (all p > 0.05). ChatGPT scored highest in Basic Hearing Science (88%), while DeepSeek performed the best in Hearing and Speech Communication Disorders (84%). Both models scored lowest in Behavioral Audiology (ChatGPT: 70%; DeepSeek: 72%). Question-type analysis revealed that both models performed well on the reverse logic questions (ChatGPT: 83.2%; DeepSeek: 84.2%), but mediocrely on the complex multiple-choice questions (ChatGPT: 52.9%; DeepSeek: 64.7%). However, both models performed poorly on the graph-based questions (ChatGPT:18.2%; DeepSeek:36.4%). Conclusions: Both GenAI models demonstrated solid professional knowledge and reasoning ability, meeting the basic requirements of audiologists. However, they showed limitations in graph-based and complex clinical reasoning. Future research should explore their performance in open-ended, real-world clinical scenarios to assess practical applicability and limitations.

  • Background: Information gathering is the foundational skill of clinical reasoning. However, residents and attending physicians have no objective insights into the competencies of residents in developing skills in information gathering in the electronic health record (EHR). The EHR audit logs, time stamped records of user activities, can provide a wealth of information about how residents gather information about patients at the time of admissions and throughout daily rounding. Objective: In this study, our goals were to: 1. Understand and delineate attending physician expectations of residents’ EHR-based information gathering activities at different stages of residency, 2. Develop a system, referred to as the Trainee Digital Growth Chart (a.k.a Growth Chart), using the EHR audit logs to audit and feed back information gathering performance to residents and their attending physicians, and 3. Pilot the Growth Chart among pediatric residents on pediatric hospital medicine (PHM) rotations to understand whether audit and feedback data on EHR-based information gathering is helpful in supporting resident learning and assessment. Methods: We convened a focus group of PHM attending physicians to establish information gathering benchmarks for residents at each stage of their training. Residents and attendings were involved in the co-design of an information gathering performance electronic dashboard called the Trainee Digital Growth Chart. This dashboard was piloted in an observational cohort study among PHM residents and attending physicians during the 2023-24 academic year. Results: Considerable variability was observed as focus group attendings established training-stage specific benchmarks. During the pilot, resident and attending logged into the Growth Chart to observe performance at moderate to high rates. However, despite their involvement in its co-design, most participants did not find great value in the Growth Chart. However, as an intervention, viewing prior Growth Chart information gathering performance had a positive impact on future information gathering performance among first year residents on daily rounds when that performance was also discussed with an attending physician. Conclusions: Information gathering is at the foundation of clinical reasoning. However, no competency-based benchmarks for information gathering in the EHR exist. Opportunities to leverage the EHR audit logs exist to feed back performance information to trainees, thereby influencing future information gathering behaviors. This is particularly powerful when done early in training before habits become formed, and when done in conjunction with verbal review with an attending physician. Such tools must find their way into routine clinical workflows and be capable of providing real time or near real time feedback before perceived educational value will be realized. Nevertheless, these approaches have broad potential to scale across specialties and allied health disciplines.

  • Background: Most medical schools do not require anesthesiology as part of their clerkship curricula, limiting student exposure to the specialty. Objective: This study aims to investigate whether the California Anesthesiology Medical Student Symposium (CAMSS), a one-day conference composed of anesthesiology lectures and workshops led by residency program leaders, can increase student knowledge or interest in anesthesiology. Methods: The Annual CAMSS of 2022 was organized at University of California Irvine School of Medicine by medical students and residency program leaders. An online survey was distributed to all registered students three days prior to the conference and immediately afterwards. Student exposure, knowledge, and interest in anesthesiology were evaluated using Likert-scales. Pre-conference versus post-conference results were analyzed using two-sample t-tests with a p-value < 0.05 considered as statistically significant. Results: The pre-conference survey was emailed to all 96 students who registered for the conference, 68 of which completed the survey (response rate 70.8%). The post-conference survey was emailed to all 83 students who attended the conference, 51 of which completed the survey (response rate 61.4%). On a Likert scale of 1-10, post-conference survey responses revealed a statistically significant increase in self-perceived knowledge of anesthesiology compared to pre-conference surveys (mean 6.44, SD 1.79 vs. mean 4.71, SD 2.07 respectively; p < 0.001). Conclusions: A one-day anesthesiology-focused conference can increase medical students’ self-perceived knowledge of the specialty’s multifaceted role in the hospital setting. Clinical Trial: This prospective cohort observational study was approved by University of California, Los Angeles Medical Institutional Review Board (IRB) # 21-001825.

  • Background: Large language models (LLMs) such as ChatGPT have shown promise in medical education assessments, but the comparative effects of prompt engineering across optimized variants and relative performance against medical students remain unclear. Objective: To systematically evaluate the impact of prompt engineering on five ChatGPT variants (GPT-3.5, GPT-4.0, GPT-4o, GPT-4o1mini, GPT-4o1) and benchmark their performance against fourth-year medical students in midterm and final examinations. Methods: A 100-item examination dataset covering multiple-choice, short-answer, clinical case analysis, and image-based questions was administered to each model under no-prompt and prompt-engineered conditions over five independent runs. Student cohort scores (n=143) were collected for comparison. Responses were scored using standardized rubrics, converted to percentages, and analyzed in SPSS Statistics 29 with paired t-tests and Cohen’s d (p<0.05). Results: Baseline midterm scores ranged from 59.2% (GPT-3.5) to 94.1% (GPT-4o1); final scores from 55.0% to 92.4%. Fourth-year students averaged 89.4% (midterm) and 80.2% (final). Prompt engineering significantly improved GPT-3.5 (+10.6%, p<0.001) and GPT-4.0 (+3.2%, p=0.002) but yielded negligible gains for optimized variants (p=0.066–0.94). Optimized models matched or exceeded student performance on both exams. Conclusions: Prompt engineering enhances early-generation model performance, whereas advanced variants inherently achieve near-ceiling accuracy, surpassing medical students. As LLMs mature, emphasis should shift from prompt design to model selection, multimodal integration, and critical use of AI as a learning companion. Clinical Trial: IRB #CSMU-2024-075

  • Background: Learner autonomy is vital in voluntary online education, but its specific impact on learning outcomes, particularly in acquiring practical medical skills like cardiac auscultation, remains underexplored. Objective: This study aimed to quantify the role of learner autonomy in determining the effectiveness of voluntary online cardiac auscultation training. Methods: We conducted a prospective, self-controlled, single-center study, enrolling 122 doctors and 77 medical students through WeChat. Participants attended four 2-hour online interactive sessions using authentic heart sound recordings, supplemented by imaging modalities and clinical case studies. Learner autonomy was quantitatively assessed via pre- and post-training tests, frequency of responses to random in-class questions, and detailed tracking of post-class review activities. Data were analyzed using multivariate linear regression and receiver operating characteristic (ROC) curve analysis. Results: Of the 199 registrants, 73% participated, and only 23% completed all sessions. Auscultation test scores improved significantly from 40 (20-50) to 70 (50-83) (P=0.000). Full attendance (β=0.602, P=0.000) and active classroom engagement (β=0.695, P=0.000) significantly predicted higher final scores. Intrinsic motivation correlated positively with full attendance (P=0.045). ROC analysis demonstrated that outstanding learners engaged significantly more in post-class review activities. Conclusions: Learner autonomy—manifested as full participation, active engagement, and intrinsic motivation—is crucial for successful outcomes in voluntary online cardiac auscultation courses. Educators should strategically foster learner autonomy by clearly defining learner prerequisites, explicitly communicating course requirements during recruitment, integrating interactive and community-building elements into live sessions, and actively encouraging structured post-class review activities.

  • Trends in the Japanese National Medical Licensing Examination: A Cross-sectional Study

    Date Submitted: May 28, 2025
    Open Peer Review Period: May 29, 2025 - Jul 24, 2025

    Background: The Japanese National Medical Licensing Examination (NMLE) is mandatory for all medical graduates to become licensed physicians in Japan. Given the cultural emphasis on summative assessment, the NMLE has had a significant impact on Japanese medical education. Although the NMLE Content Guidelines have been revised approximately every five years over the last two decades, there is an absence of objective literature analyzing how the actual exam itself has evolved. Objective: To provide a holistic view of the trends of the actual exam over time, this study used a combined rule-based and data-driven approach. We primarily focused on classifying the questions according to the perspectives outlined in the NMLE Content Guidelines, while complementing this approach with a natural language processing technique called topic modeling to identify latent topics. Methods: Publicly available NMLE data from 2001 to 2024 were collected. Six exam iterations (2,880 questions) were manually classified from three perspectives (Level, Content, and Taxonomy) based on pre-established rules derived from the guidelines. Temporal trends within each classification were evaluated using the Cochran-Armitage test. Additionally, topic modeling was conducted for all 24 exam iterations (11,540 questions) using the BERTopic framework. The temporal trends of each topic were traced using linear regression models of topic frequencies to identify topics growing in prominence. Results: In Level classification, the proportion of questions addressing common or emergent diseases increased from 60% to 76% (p < 0.001). In Content classification, the proportion of questions assessing knowledge of pathophysiology decreased from 52% to 33% (p < 0.001), whereas the proportion assessing practical knowledge of primary emergency care increased from 20% to 29% (p < 0.001). In Taxonomy classification, the proportion of questions that could be answered solely through simple recall of knowledge decreased from 51% to 30% (p < 0.001), while the proportion assessing advanced analytical skills, such as interpreting and evaluating the meaning of each answer choice according to the given context, increased from 4% to 19% (p < 0.001). Topic modeling identified 25 distinct topics, and 10 topics exhibited an increasing trend. Non-organ-specific topics with notable increases included “Comprehensive Clinical Questions,” “Accountability in Medical Practice and Patients’ Rights,” “Care, Daily Living Support, and Community Healthcare,” and “Infection Control and Safety Management in Basic Clinical Procedures.” Conclusions: This study identified significant shifts in the Japanese NMLE over the past two decades, suggesting that Japanese undergraduate medical education is evolving to place greater importance on practical problem-solving abilities than on rote memorization. This study also identified latent topics that showed an increase, possibly reflecting underlying social conditions. Clinical Trial: NA

  • Background: Performing a radial artery puncture is often stressful for medical students due to the risk of causing significant pain. Objective: This study evaluated whether a structured training programme—combining theoretical instruction, simulation-based practice, and debriefing—could influence students’ procedural confidence, decision-making, and patient experience during their first clinical arterial puncture. Methods: Third-year medical students who had never performed an arterial puncture were assigned to one of two groups: a structured training group (G1) or a control group receiving informal or no specific training (G2). After performing their first arterial puncture under supervision, students completed a questionnaire assessing apprehension, satisfaction, and confidence. The decision to use local anaesthesia, puncture success, and patient-rated pain and apprehension were also recorded. Results: Self-reported apprehension and confidence were similar between groups. However, G1 students were significantly less likely to use local anaesthesia compared to G2 students (35% vs. 76%; p = 0.0033), suggesting greater procedural confidence. First-attempt success rates were comparable (G1: 23%; G2: 48%; p = 0.18). Patient pain ratings were lower when anaesthesia was used, but the difference was not statistically significant. Conclusions: Structured training influenced students’ behaviour during their first arterial puncture, reducing reliance on anaesthesia despite similar levels of self-reported apprehension. These findings support the behavioural impact of structured procedural education and call for future research using validated assessment tools and long-term follow-up. Clinical Trial: This study was conducted in hospitals affiliated with the Faculté de Santé Sorbonne Université, one of the medical schools in the Paris area. It was approved by the institutional review board of the Institut National de la Santé et de la Recherche Médicale (CEEI, reference IRB00003888).

  • Background: Clinical reasoning is a key skill of the medical profession. In many virtual patient environments, the students enter the diagnoses, and all students receive the same feedback with an explanation why a certain diagnosis is considered correct. Results of meta-analyses highlight the benefits of feedbacking information to students based on their individual answers. Such adaptive feedback is time and resource demanding. Objective: We propose computer-supported adaptive feedback as an interactive, resource-optimised and scalable alternative. Methods: In the current study we compare static expert feedback agains computer supported adaptive feedback in two learning modes, individual and collaborative learning modes. Overall 105 students completed a pre and post test, consisting of 10 multiple choice items and 12 key feature items. In the meantime they diagnosed 8 virtual patients with either adaptive feedback or static feedback, either in the collaborative or individual learning mode. Results: Results indicate that students who received computer supported adaptive feedback outperformed students who received static feedback in the posttest independent from the learning mode. Students who worked in the collaborative learning mode had a higher diagnostic accuracy in the learning phase, but not in the posttest, independent from the feedback given. Conclusions: Considering the novelty of the system in itself and the presentation of the adaptive feedback to the students the results are promising. With future development and implementation of artificial intelligence in the generation of answers the learning of medical students. Until then an NLP-based system, such as the one presented in this study, seems to be a viable solution to provide a large number of students with elaborated adaptive feedback. Clinical Trial: 17-250