Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Technology, innovation, and openness in medical education in the information age.

Latest Submissions Open for Peer Review

JMIR has been a leader in applying openness, participation, collaboration and other "2.0" ideas to scholarly publishing, and since December 2009 offers open peer review articles, allowing JMIR users to sign themselves up as peer reviewers for specific articles currently considered by the Journal (in addition to author- and editor-selected reviewers). Note that this is a not a complete list of submissions as authors can opt-out. The list below shows recently submitted articles where submitting authors have not opted-out of open peer-review and where the editor has not made a decision yet. (Note that this feature is for reviewing specific articles - if you just want to sign up as reviewer (and wait for the editor to contact you if articles match your interests), please sign up as reviewer using your profile).
To assign yourself to an article as reviewer, you must have a user account on this site (if you don't have one, register for a free account here) and be logged in (please verify that your email address in your profile is correct). Add yourself as a peer reviewer to any article by clicking the '+Peer-review Me!+' link under each article. Full instructions on how to complete your review will be sent to you via email shortly after. Do not sign up as peer-reviewer if you have any conflicts of interest (note that we will treat any attempts by authors to sign up as reviewer under a false identity as scientific misconduct and reserve the right to promptly reject the article and inform the host institution).
The standard turnaround time for reviews is currently 2 weeks, and the general aim is to give constructive feedback to the authors and/or to prevent publication of uninteresting or fatally flawed articles. Reviewers will be acknowledged by name if the article is published, but remain anonymous if the article is declined.

The abstracts on this page are unpublished studies - please do not cite them (yet). If you wish to cite them/wish to see them published, write your opinion in the form of a peer-review!

Tip: Include the RSS feed of the JMIR submissions on this page on your iGoogle homepage, blog, or desktop RSS reader to stay informed about current submissions!

JMIR Submissions under Open Peer Review

↑ Grab this Headline Animator


If you follow us on Twitter, we will also announce new submissions under open peer-review there.


Titles/Abstracts of Articles Currently Open for Review


Titles/Abstracts of Articles Currently Open for Review:

  • Background: Generative artificial intelligence (GenAI) has rapidly expanded in higher education and clinical practice. Large language models such as ChatGPT are widely adopted by health profession students for learning and writing tasks. However, little is known about how these tools are mobilized during clinical placements, a critical stage of training where students face high cognitive demands and increasing responsibility for patient care. Objective: This study aimed to map self-reported uses of GenAI during clinical placements, assess perceived benefits and risks, and identify training and governance needs. Methods: We conducted a cross-sectional online survey at Université Grenoble Alpes (France) from June to September 2025. Eligible participants were students in medicine, pharmacy, nursing, midwifery, or physiotherapy who were currently in, or had completed within the past 18 months, a clinical placement. The 61-item questionnaire included closed and open-ended questions. A composite maturity score classified respondents as Minimal, Limited, Moderate, or High. Descriptive statistics and trend tests were used for analysis. Results: Among 388 respondents (79% female; 56% nursing, 18% medicine, 17% pharmacy, 6% midwifery, 3% physiotherapy), 53% reported using GenAI during placements. Uptake was lowest in midwifery (26%) and rose markedly with maturity (9% Minimal vs 76% High; P<.001). Students mainly used GenAI for information retrieval (78%), bibliographic search (75%), and translation/rephrasing (71%). Clinical-facing tasks such as case simulation (55%), drafting patient documents (38%), or preparing patient communication (38%) were less frequent, with fewer than 15% reporting weekly use. Most students avoided entering patient identifiers, but 23% acknowledged at least one disclosure, and 47% reported sharing anonymized medical data. Benefits were most often perceived for documentation support (81%) and information access (69%). Risks included dependence (91%), erosion of skills (85%), and confidentiality breaches (87%). Students highlighted strong needs for ethics/regulation training (78%), best-practice guidance (78%), profession-specific coaching (74%), and human–AI collaboration (73%). Conclusions: GenAI is already embedded in the daily practices of health profession students during placements, primarily as a tool for documentation and information management. While students recognize its utility, they also express concerns about dependence, skills, and confidentiality. These findings underscore the urgent need for structured curricula and governance frameworks to support responsible and patient-centered integration of GenAI into clinical education.

  • AI Patients for Training CBT Therapists to Develop Clinical Competence: Pilot Study of CBT Trainer

    Date Submitted: Sep 22, 2025
    Open Peer Review Period: Sep 28, 2025 - Nov 23, 2025

    Background: Training challenges in Cognitive Behavioral Therapy (CBT) include limited supervised practice with diverse cases, inconsistent feedback, resource-intensive supervision, and difficulties standardising competence assessment. Objective: This study evaluated the acceptability and feasibility of CBT Trainer, a mobile application for training CBT practitioners through the use of standardized AI patient interactions and the evaluation of therapist responses against competence frameworks to enable structured skill development grounded in experiential learning and deliberate practice. Methods: This mixed-methods pilot study employed a two-stage approach. Stage 1 involved usability testing with four participants. Stage 2 included 59 participants from psychological practitioner training programs (a Low Intensity CBT Interventions Programme and a Doctorate in Clinical Psychology) who engaged with the CBT Trainer over one month. Measures of impact included the System Usability Scale (SUS), platform engagement, post-study questionnaire on perceived learning outcomes and qualitative feedback. Results: CBT Trainer performed well on all pre-specified outcome targets. Participants spent an average of 95.24 minutes (SD=134.58) in roleplays, completed an average of 4.15 role-play sessions (SD=3.55), with 49.69 interactions per session. Platform usability was rated as excellent (mean SUS=82.20, SD=12.93). Self-reported competence improvement was highest in assessment skills (96.7%), followed by information gathering (66.7%). Key advantages over traditional methods included immediate feedback (83.3%) and convenience (73.3%). Conclusions: This pilot study provides evidence that an AI-based patient simulation shows promise as a supplementary training tool for CBT therapists, particularly regarding accessibility and immediate feedback. Future research should employ randomized controlled designs with objective competence assessments. Clinical Trial: The study protocol was pre-registered with the Open Science Framework (https://osf.io/mskb7).

  • Background: Introduction: 30 Artificial intelligence (AI) refers to modern systems that use self-learning algorithms to perform tasks 31 and generate responses that mimic human intelligence(1). Despite its early beginnings, the outburst in its popularity and success was in 2022 when OpenAI launched ChatGPT. Within just 5 days, the 33 platform gained one million users, making it the second-fastest application in history to reach this 34 milestone(2,3). 35 In the medical domain, AI has proved itself as a valuable tool for both education and healthcare. For 36 students, AI assists in editing texts, generating medical case scenarios, summarizing complex topics, 37 and simplifying difficult information, which save time and enhance efficiency(4,5). In healthcare, AI 38 has proved itself as a valuable tool for medical practitioners. It has passed the USMLE exams and 39 proved its efficiency in disease diagnosis across multiple specialties. Its performance is particularly 40 notable in specialties that relies on image analysis such as radiology and pathology(6–10). However, 41 these advancements are not without doubts and concerns. The rapid growth of AI has sparked 42 debates over ethical issues, job displacement in healthcare, and other challenges(11,12). 43 The integration of AI tools in medical education is crucial, as this new and fast-growing technology 44 will shape the future if healthcare, despite its absence from traditional medical curricula. The World 45 Medical Association has rendorsed AI integration into medical education; however, its 46 implementation is still not sufficient worldwide(13,14). This gap presents a notable challenge in low- 47 resource areas, where limited access to computers, reliable electricity, and internet connectivity 48 further hinders adoption(15). 49 Syria has endured a devastating war since 2011, resulting in millions of refugees both inside and 50 outside the country(16). The conflict has huge impacts on all aspects of life, including healthcare and 51 medical education. Over half of medical facilities were destroyed or damaged during the war(17–19). 52 Moreover, the concurrent economic collapse and infrastructure devastation have created severe 53 poverty and shortages in basic necessities, including food and electricity among others(20). These 54 factors have significantly impacted medical education and access to knowledge. One notable 55 consequence is that most medical students now intend to immigrate after graduation, with Germany 56 emerging as the primary destination for Syrian doctors(21–23). 57 The study aims to evaluate Syrian medical students' experiences with artificial intelligence tools and 58 their perceptions regarding AI's role in medical education and providing healthcare. We aim to 59 examine how war-related consequences -including limited computer access due to economic 60 constraints and high emigration intentions- influence the experiences and perceptions of AI. 61 Additionally, we aim to explore the differences across academic years to assess whether experiences 62 and perceptions are evolving positively over time. These findings will provide insights into the 63 readiness of Syria's future doctors to utilize AI tools in the future, despite having trained in an 64 unsupportive environment for modern technologies. Objective: This study highlights the increasing reliance on AI tools among medical students and graduates for academic and clinical purposes. The highest usage was reported in study preparation, writing tasks, and clinical simulations. Significant differences in AI usage were observed based on academic level, gender, access to technology, and research experience. While perceptions were largely positive concerns remained around ethical use, potential job displacement, and diminished human interaction in medicine. These findings underscore the importance of developing institutional policies to guide the ethical and effective integration of AI in medical education. Given these outcomes, we believe that this manuscript is suitable for publication in JMIR MEDICAL EDUCATION Methods: Methods: 66 Study Design and Setting: 67 This study is a cross-sectional, descriptive survey conducted in April and May 2025 among medical 68 students in Syria. The primary aim was to explore the experiences of medical students and recent 69 graduates with artificial intelligence (AI) tools, as well as their perceptions of AI’s role in medical 70 education and clinical practice. 71 Participants and Sampling: 72 A total of 400 participants were enrolled in the study, with 100 samples collected per academic year 73 over four consecutive years. The sample included clinical-year medical students (fourth and fifth year 74 of medicine) and graduate and prospective graduate students. Participants were recruited through 75 convenience sampling via student networks and social media platforms. 76 Exclusion criteria included pre-clinical students (first to third year) from faculties of human medicine, 77 as well as postgraduate specialty trainees (residents). 78 Inclusion criteria included current enrollment in clinical medical training or graduation within the 79 past two years, as well as willingness to voluntarily complete the questionnaire. Participation was 80 anonymous and uncompensated. 81 Data Collection Instrument: 82 Data were collected using a structured, self-administered online questionnaire. The original 83 questionnaire was developed in English and professionally translated into Arabic by experts in 84 medical education and linguistics. 85 The choice of question formats and content was guided by researchers’ observations of the Syrian 86 community context and the socioeconomic challenges resulting from the ongoing Syrian crisis. 87 The final Arabic version of the questionnaire was reviewed and validated by domain experts prior to 88 distribution. It consisted of four main sections: 89 1. Demographic and academic information – including age, gender, academic year, computer 90 ownership, language learning background, and prior research experience. 91 2. Experience with AI tools – assessing the use of tools like ChatGPT and digital anatomy platforms 92 across multiple contexts such as studying, exam preparation, clinical decision-making, and research 93 writing. 3. Perceptions of AI – evaluated using a 5-point Likert scale (from strongly disagree to strongly 95 agree), addressing AI’s role in learning, medical ethics, patient care, and its future impact on the field 96 of medicine. 97 4. Factors influencing AI adoption – including clinical exposure, language learning goals (e.g., 98 German), emigration intentions, and access to digital technology. The survey incorporated items 99 derived from prior literature and was pilot-tested with a small sample of medical students to ensure 100 clarity and contextual relevance. 101 Data Analysis: 102 Quantitative data were analyzed using IBM SPSS Statistics (version 27 ). Descriptive statistics 103 (frequencies, means, percentages) were used to summarize participant characteristics and AI usage 104 behaviors. Inferential analyses included: 105 Chi-square tests to assess associations between categorical variables (e.g., gender, AI usage 106 patterns). Mann–Whitney U tests to compare differences in AI perceptions between subgroups. 107 Binary logistic regression models to determine predictors of positive perceptions and AI-related 108 behaviors. Predictor variables included gender, clinical status, prior AI experience, and research 109 background. Adjusted odds ratios (aORs) and 95% confidence intervals (CIs) were reported. A p-value 110 < 0.05 was considered statistically significant. 111 Ethical Considerations: 112 Ethical approval for biomedical research was obtained from the Ethical Approval for Biomedical 113 Researchers (EABR) committee under approval number 1612. All participants provided informed 114 digital consent before participating. Responses were collected anonymously, and all data were stored 115 securely to ensure confidentiality and privacy. Results: Results: 117 Demographic characteristic: 118 A total of 400 medical students and recent graduates participated in the study. The sample was 119 nearly gender-balanced (51.5% male, 48.5% female), with most participants aged between 20 and 24 120 years (84%). Half of the respondents were clinical-year students, and the other half were either 121 graduated or prospective graduates. A majority (61.3%) resided in urban areas. Regarding income, 122 43% reported a good income level, while 38% reported moderate income. Notably, 72.5% of 123 participants expressed a desire to pursue postgraduate specialization abroad. In terms of technology 124 and language exposure, 68.8% reported having access to a personal computer. 39.8% had studied 125 the German language, with 51% of those using AI tools to support their learning. 28.2% had prior research experience. In English proficiency, 59.3% rated themselves as good and 35.5% as excellent. 127 Knowledge about AI came primarily from social media (73.5%), followed by reading research (15%), 128 and discussions with IT-savvy friends (9.75%). Prior experience in Using AI: 132 Academic and clinical tasks for which Chatgpt was used with notable proportions as follows: 57% 133 used AI to assist with studying or exam preparation, 40% used it to complete written assignments, 134 35.5% used it to suggest research topics or questions, 33.8% used it to write research papers, 23.3% 135 used it to help with writing case reports, 21.8% used it to generate self-assessment questions, 11.8% 136 used it to help write patient notes Using AI technologies during medical school was also reported where 72.3% participants used digital 138 anatomy tools, 42.8% used computational pathology tools, 22.5% used AI-generated cases for clinical 139 simulation. Using AI during residency: 148 In a more comprehensive set of questions, participants reported using AI for broader educational and 149 clinical purposes during residency: 85.5% showed interest for using AI to help answer medical 150 questions, 82.8% to explore new medical topics or research, 80.8% to assist with studying or 151 preparation, 75% to help write research papers, 65.3% to help write case reports, 52.8% to help write 152 patient notes, 52% to assist in clinical decision-making. Perceptions of artificial intelligence (AI) for career, education and patient care: 156 Descriptive analysis of participant responses revealed generally favorable perceptions of artificial 157 intelligence (AI) in medical education and practice. Items were rated using a five-point Likert scale, 158 and positive perceptions were defined as the combined percentage of “Agree” and “Strongly Agree” 159 responses. 160 1. AI in Learning and Career Development: 161 76% of participants agreed that ChatGPT could improve their learning during residency, 47.3% 162 agreed that AI tools effectively met their needs during medical school, 48% preferred using ChatGPT 163 over traditional resources such as Google or medical references, 70.8% agreed that the answers 164 provided by ChatGPT need to be verified, 75.3% reported looking forward to using more advanced 165 versions of ChatGPT or AI in their future careers, just 28.8% reported that their peers have always 166 used ChatGPT ethically, 71.8% supported the implementation of institutional policies regulating AI 167 use by trainees, just 25.5% were agree that AI would create more career opportunities, 11% stated 168 that AI had influenced their specialty choice, 33.3% agreed with the statement that AI would limit job 169 opportunities. 170 2. AI in Patient Care: 171 54.6% expressed concern about AI’s ethical impact on healthcare, yet 42.1% agreed that AI would 172 improve diagnostic accuracy, 58.8% believed AI would enhance patient care, 64.3% expected AI to 173 significantly impact the healthcare system overall, 57.6% were concerned that AI might reduce the 174 humanistic aspect of medicine, 43.1% agreed that AI would help reduce medical errors, 62.5% 175 expressed concern that AI might reduce patient trust in physicians. 176 Although some concerns were evident—particularly regarding ethics and the humanistic dimension 177 of care—overall, a majority of students expressed optimism toward the integration of AI in both 178 educational and clinical domains. Prior experience in using Ai between students’ demographic characteristics: 184 A series of chi-square tests were conducted to assess associations between students’ demographic 185 characteristics (academic level, gender, prior research experience, computer availability, and German 186 language exposure) and their use of AI technologies (Chatgpt) in various educational contexts. Significant associations were identified between academic level and multiple AI use cases. Clinical 188 year students were more likely than graduate and prospective graduate students to use AI for 189 studying or exam preparation (p < .001), generating self-assessment questions (p = .021), suggesting 190 research topics (p < .001), writing research papers (p = .015), completing written assignments (p < 191 .001), creating simulated clinical cases (p = .031) and help write patient notes (p = .02). 192 Gender was significantly associated with the use of AI for writing research papers (p = .004), suggest 193 research topics or questions (p = 0.039), using digital anatomy (p = 0.023) where male students 194 reported higher usage. 195 Students with prior research experience were significantly more likely to use AI for suggesting 196 research topics (p = .039), writing research papers (p < .001), writing case reports (p = .002), and 197 using digital anatomy platforms (p < .001). 198 Having access to a personal computer was also positively associated with using AI to help write 199 research papers (p = .005) and to support learning in digital anatomy platforms (p = .001). 200 Although Learning German language was not statistically associated with most AI uses, students who 201 are studying German reported slightly higher use of AI for language learning as to help complete 202 written assignments (p= .017) and they used AI to help write research papers more than students 203 who aren’t studying German (p = .004). This trend reflects a culturally motivated behavior, 204 particularly among Syrian students, who often aspire to pursue postgraduate training in Germany 205 due to the ongoing crisis. These students appeared to use AI to simulate German language 206 acquisition, highlighting AI's potential role in educational mobility and international preparation. Perceptions of artificial intelligence (AI) for career and education between students’ demographic 209 characteristics: 210 A series of Mann–Whitney U tests were conducted to compare perceptions of AI for career and 211 education among medical students based on academic level, gender, prior research experience, 212 access to a personal computer, and German language exposure. 213 Graduate and prospective graduate students expressed significantly more positive perceptions 214 regarding ChatGPT’s usefulness in learning during residency (p = 0.026), suggesting AI is seen as a 215 valuable educational aid during clinical rotations, and they believed that Medical schools and residency programs should develop policies about the use of Chatgpt and AI by trainees more than 217 clinical years' students (p = 0.017). 218 Whereas AI was effective in meeting Clinical years' students' needs during medical school more than 219 graduate - prospective graduate students (p = 0.001). 220 Male students rated AI significantly higher in Supporting residency learning (p = 0.007), Preference 221 for ChatGPT over other search engines (p = 0.003), Enthusiasm toward future versions of AI (p = 222 0.009). 223 AI’s potential has impacted residency specialty choice of students with no personal computers more 224 (p = 0.036), whereas students with personal computers believed that their peers have used chatgot 225 ethically less than who don’t have (p = 0.007). 226 Students who had learned German rated ChatGPT as more helpful for learning during residency (p = 227 0.005), which may reflect AI’s perceived value in language acquisition for migration-preparing 228 students. Also, they showed enthusiasm toward future versions of AI (p = 0.003). 229 Those with prior research experience were less confident in ChatGPT’s accuracy (p < 0.001), more 230 supportive of medical schools establishing policies for AI use (p = 0.005), have more positive 231 perceptions regarding ChatGPT’s usefulness in learning during residency (p = 0.034). whereas those 232 with no experience perceived their peers as using ChatGPT more ethically (p = 0.014), and their 233 residency choice has been impacted more by AI’s potential (p = 0.038). Perceptions of artificial intelligence (AI) for patient care between students’ demographic 242 characteristics: 243 A series of Mann–Whitney U tests were conducted to compare perceptions of AI for patient care and 244 Professional Practice among medical students based on academic level, gender, prior research 245 experience, access to a personal computer, and German language exposure. 246 Gender again played a role where Male students had significantly higher agreement that AI will 247 improve diagnostic accuracy (p < 0.001), improve patient care (p < 0.001), AI will have a major impact 248 on healthcare (p < 0.001), reduce medical errors (p = 0.005), whereas female students worry more 249 about the ethical impact of AI on healthcare (p = 0.019). 250 Graduate and prospective graduate students believed more that AI will have a major impact on 251 healthcare during residency (p = 0.033). 252 German language learners showed significantly higher agreement with the statement that AI will 253 improve patient care (p = 0.007), have a major impact on healthcare (p = 0.016) and will enable them 254 to make more accurate diagnoses (p = 0.001). 255 Prior research experience was associated with more concern that AI will reduce patient trust in 256 physicians (p = 0.028).Across separate binary logistic regression models predicting AI perceptions, two factors consistently 259 emerged as key drivers. Prior AI usage experience significantly increased the odds of a favorable 260 perception in multiple domains: each additional AI‐use task raised the likelihood that students would 261 agree ChatGPT improves learning during residency (adjusted OR=1.43, 95% CI 1.15–1.79, p=.002), 262 that it was effective in meeting medical‐school needs (aOR=1.60, 95% CI 1.34–1.92, p<.001), and that 263 it would help reduce medical errors (aOR=1.14, 95% CI 1.01–1.28, p=.040). 264 Clinical‐year status was consistently associated with more reserved views: clinical year students were 265 less likely to believe ChatGPT enhances learning during residency (aOR=0.26, 95% CI 0.11–0.64, 266 p=.003), less likely to anticipate using future versions (aOR=0.18, 95% CI 0.07–0.49, p<.001), and less 267 likely to expect AI to have a major impact on healthcare (aOR=0.24, 95% CI 0.09–0.65, p=.005). 268 Gender differences appeared: female students had three times the odds of insisting on verifying 269 ChatGPT’s answers (aOR=3.12, 95% CI 1.36–7.15, p=.007) and female students were twice as likely to 270 worry about AI’s ethical impact on healthcare (aOR=2.06, 95% CI 1.19–3.54, p=.010). 271 Prior research experience did not significantly predict any positive perception once other factors 272 were controlled. Conclusions: Conclusion: 460 This study reveals a predominantly optimistic outlook among Syrian medical students and residents 461 regarding the role of AI in education and clinical care. High levels of prior usage suggest both 462 accessibility and growing technological literacy, though notable concerns—particularly ethical and 463 relational—underscore the need for guided integration. Differences across gender, training stage, 464 and socioeconomic proxies reflect the nuanced landscape of AI perception in medical education. 465 While enthusiasm is widespread, cautious appraisal of AI’s limitations and its potential to disrupt 466 traditional human-centered care remains essential. As such, integrating AI into medical education 467 must be done strategically, emphasizing critical appraisal, ethical awareness, and reinforcement of 468 compassionate clinical reasoning. These findings contribute meaningfully to the regional and global 469 discourse on responsible AI adoption in health professions education.

  • A digital clinician training module significantly improved attitudes, beliefs, and self-perceived competence among a cohort of medical students (n=87) towards working with trans and gender diverse people (TGD) with eating disorders (EDs). Future clinician training resources can adopt this cost-effective and accessible learning model and build upon existing introductory attitude- and awareness- based programs by focusing on tackling a specific health disparity among LGBTQIA+ people.

  • Background: Large language model (LLM)-driven virtual patients (VPs) are increasingly used to simulate history taking. However, there is currently no straightforward methodological approach to effectively identify students’ clinical reasoning activities during these interactions, which limits the ability to provide personalised feedback. Objective: This study aims to develop a structured coding scheme to characterise medical students’ behaviours during interactions with LLM-driven VPs. Methods: Second-year medical students (N=210) completed text-based history-taking sessions across five simulated chest pain cases, yielding 1,030 dialogues. Dialogues from Cases 1–4 were analysed using systematic text condensation (STC) to develop a coding scheme inductively. Two raters independently coded a subset of dialogues, and inter-coder reliability was assessed using Cohen’s kappa. The established scheme was then applied to the dialogues from Case 5, and Pearson correlation coefficients (r) were used to assess associations between code frequencies and external performance outcomes: diagnostic accuracy, history-taking checklist scores, clinical knowledge test scores, and post-encounter form (PEF) scores. Results: The STC analysis produced a 12-code scheme comprising four clinical reasoning codes (Pathophysiologic Question, Relevant Response, Summarising & Integrating, Logical Organisation), six information-gathering codes, and two communication codes. Inter-coder reliability was high for all dimensions: clinical reasoning (κ = 1.00), information gathering (κ = 0.95-0.98), and communication (κ = 1.00). In Case 5, Summarising & Integrating was most predictive, correlating with diagnostic accuracy (χ2 =6.019, P=.014), checklist scores (r=0.208, P=.003), knowledge test scores (r=0.225, P=.002), and PEF scores (r=0.191, P=.009). Logical organisation (LO) also correlated with diagnostic accuracy (χ2 =0.188, P=.008), checklist (r=0.592, P<.001), and knowledge test scores (r=0.170, P=.013). Patho-physiologic question showed weaker but significant associations with checklist and knowledge tests (r=0.177, p=.013 and r=0.145,p=.042). Only two information-gathering codes demonstrated weak-to-moderate associations with checklist and knowledge test scores, while only one communication code showed a weak association with knowledge tests. Conclusions: This study developed a theory-informed coding scheme that reliably distinguishes information-gathering and reasoning behaviours in history-taking with virtual patients. Enabling the identification of diverse behaviours provides a foundation for formative assessment and personalised feedback, offering a scalable approach to support the development of clinical reasoning in medical students. Clinical Trial: NO

  • Background: Medical students increasingly rely on online tools for clinical information retrieval. While traditional platforms like Google, PubMed, and UpToDate are commonly used, the rise of large language models (LLMs) such as ChatGPT offers new potential in this area. Objective: This study compared ChatGPT with conventional online resources in terms of accuracy, speed, and user satisfaction when retrieving medical information among sixth-year medical students. Methods: We conducted a quasi-experimental, posttest-only study with 64 sixth-year medical students at a Thai teaching hospital. Students were initially randomly assigned to use either ChatGPT or traditional online medical resources (Google, PubMed, and UpToDate), with final groups of 35 and 29 participants respectively due to protocol deviations. Participants completed a 10-question open-ended test on internal medicine topics. Responses were scored using expert-validated rubrics with multiple key elements per question, totaling 10 points each. We also recorded completion time and assessed satisfaction via a 15-item Likert questionnaire. Between-group comparisons used independent t-tests. Results: The ChatGPT group achieved significantly higher accuracy scores (mean 63.79, SD 9.71) than the control group (mean 47.22, SD 8.82, P<.001), reflecting a 35.1% improvement. They also completed the task more quickly (mean 58.59 vs 75.76 minutes, P=.003), with a 22.7% time reduction. Satisfaction scores were higher overall in the ChatGPT group (mean 62.06, SD 7.14 vs 56.72, SD 10.23; P=.018), particularly in the search process domain (P<.001). Performance varied by cognitive level, with application-type questions showing the largest advantage (20.63 percentage points), followed by remembering-level questions (16.14 percentage points) and analysis-level questions (8.54 percentage points). No significant associations were found between ChatGPT use and demographics, prior experience, or AI attitudes. Conclusions: ChatGPT significantly outperformed traditional online resources in accuracy, speed, and user satisfaction for clinical information retrieval among medical students. These findings suggest its potential as a supportive tool in medical education. However, future integration should include training in critical appraisal and responsible use to ensure safe and effective application in learning environments.

  • Background: There is a need to modernise clinical guideline dissemination - making it more accessible and engaging for healthcare professionals - while medical education increasingly favours participatory, applied learning. Concise Medical Information Cines (CoMICs) are brief, peer-reviewed videos created by medical students that distil complex guidelines into learner-friendly visuals. Objective: This project evaluates the use of the CoMICs model to co-create audiovisual guidelines with medical students and assess its impact on fostering early interest in endocrinology and academic medicine. Methods: A four-part CoMICs series on glucocorticoid-induced adrenal insufficiency was developed through ten iterative steps, engaging clinicians and medical students. Under endocrinologist and guideline-author mentorship, students performed literature reviews, drafted scripts, designed visuals, and participated in peer review. Semi-structured interviews with authors, reviewers, and student collaborators assessed the CoMICs’ clarity, usability, trustworthiness, and educational value. Reflexive thematic analysis then identified key themes. Results: CoMICs improved guideline accessibility, comprehension, and global adaptability while the collaborative process promoted interdisciplinary learning and underscored the efficacy of audiovisual tools for complex content. Student collaborators reported greater confidence in interpreting and communicating clinical guidance, renewed interest in endocrinology, and a deeper appreciation of its academic dimensions. Conclusions: Co-creating audiovisual resources like CoMICs enhances guideline reach and impact while serving as an effective educational tool for medical students. Early student involvement can foster curiosity, encourage academic career pathways, and reshape engagement with evidence-based medicine. Future research should assess effects on long-term academic interests and speciality choice. Clinical Trial: N/A

  • Background: Modern medical education requires efficient tools for knowledge acquisition, and large language models (LLMs) appear promising; however, they face challenges such as factual inaccuracies (“hallucinations”) and a lack of evidence transparency, particularly in critical fields such as medicine. Retrieval augmented generation (RAG) systems address these limitations by grounding LLM responses in external knowledge sources, which offers a potential solution for developing reliable educational tools. Objective: This study aimed to examine the applicability of a citable RAG system in medical education, specifically to address the challenges of LLMs regarding hallucinations and unclear evidence, and to investigate its potential applications. Methods: We designed and implemented a RAG system in Python, using the LangGraph for system construction. The system generated responses by converting input questions into search queries, retrieving relevant information from academic databases and web sources through dedicated search agents, and then integrating this information to generate comprehensive, citable answers. GPT-4o was used for both query generation and report generation within the RAG system. We used a data set of 103 medical questions for evaluation. We compared the RAG system's answers with those generated by stand-alone GPT-4o. Evaluation was conducted quantitatively by LLMs (GPT-4 and Gemini Flash 2.0) using the CLEAR reliability metric (Completeness, Lack of false information, Evidence, Appropriateness, Relevance). We also performed a subjective evaluation with 40 medical and nursing students, who used a Likert scale based on the CLEAR criteria for a subset of five randomly selected questions. Additionally, we assessed the consistency between generated text and its references using the SourceCheckup tool, which evaluated Citation URL Validity, Statement-level Support, and Response-level Support. Results: In LLM evaluations, the RAG system consistently outperformed standalone GPT-4o in "Evidence." However, student evaluations showed a different trend, with GPT-4o receiving significantly higher scores in "Completeness," "Appropriateness," and "Relevance," while RAG excelled in "Evidence." For reference evaluation, all cited URLs were valid. However, Statement-level Support was 0.516 (95% CI: 0.460, 0.571), and Response-level Support was 0.228 (95% CI: 0.158, 0.317), indicating that not all claims or full responses were directly supported by the cited sources. Conclusions: The RAG system effectively addressed the challenges of hallucination and unclear evidence in LLMs for medical education, consistently improving the evidence base of responses. However, discrepancies between LLM and human evaluations highlighted the need for further improvements in the overall structure and natural language flow of responses for practical educational implementation. Future work should focus on enhancing the consistency between referenced information and generated output, and on improving the overall coherence and clarity of the response structure, with Self-RAG emerging as a promising approach for self-verification and improved learning support.

  • Interactive e-learning module on liver ischemia: enhancing medical education on antioxidant defense mechanisms

    Date Submitted: Aug 21, 2025
    Open Peer Review Period: Aug 22, 2025 - Oct 17, 2025

    Background: Liver ischemia-reperfusion injury is one of the critical biochemical and clinical issues in hepatology; however, its education is weak because of the complexity of the oxidative stress and antioxidant defenses mechanism. It is often the case that the traditional lecture-based instruction strategies cannot guarantee long-term retention or translational of the knowledge to the clinic, creating a disconnect between knowledge theory and on-the-job training. Closing this gap is necessary since ischemia-related complications are one of the main morbidity and mortality factors after liver surgery and transplantation. Objective: The purpose of the study was to design and compare an interactive e-learning session with of improving medical-students understanding of liver ischemia and antioxidant defense, with lecture-based teaching. Methods: A pilot study has been carried out at Azerbaijan Medical University with sixty third year medical students randomly divided into a control group (lecture-based) and an experimental-group (interactive e-learning). Knowledge acquisition and retention were measured using pre-tests, immediate post-tests and follow-up tests after two and four weeks. Engagement and clinical relevance were determined by means of validated surveys. The statistical analysis was conducted with paired t -tests and ANOVA. Results: The e-learning group exhibited a +74.3 percent relative increase in knowledge scores (42.4 ± 3.3 to 78.8 ± 3.3, p < 0.01), and retention that was unchanged at four weeks (70.0 +/- 2.9). Conceptual comprehension was enhanced by 93.7 % (p < 0.01), and there was a win increase in student engagement of 64.6 % (p < 0.01). The application to clinical cases increased more than twice as much (+110.2%, p < 0.01). The control group only had minor improvements, with retention of 49.5 + 4.9 after Four weeks. Biological data in the form of complementary biochemical-analyses in ischemia models supported significant time related damage in protein metabolism especially in the situation of prolonged ischemia. Conclusions: ctive module was significantly better than lecture-based in promoting knowledge acquisition, retention, and utilization in clinical practice. The results indicate the potential of the module as a scalable and effective pedagogical innovation in the medical learning environment and toward a more robust clinical preparation. Clinical Trial: no registration number

  • A Novel Method for Detecting Ambiguity in Medical Exams Using Large Language Models

    Date Submitted: Aug 20, 2025
    Open Peer Review Period: Aug 20, 2025 - Oct 15, 2025

    Background: Large Language Models (LLMs) have emerged as promising tools in medical education due to their ability to understand, generate and reason with natural language. Their ability to simulate expert reasoning extends beyond answering questions, enabling them to support quality control in assessment design. In this study, we evaluated the utility of LLMs in identifying ambiguous or poorly constructed exam items in critical care academic assessments. Objective: We developed automated ambiguity and quality scores to objectively assess individual questions and entire exam components. Methods: We analyzed 264 questions from academic exams conducted over three academic years (2023 to 2025) at the Medical School of Université Côte d’Azur. Questions were drawn from four docimological formats: Progressive Clinical Cases (PCC), Mini-PCC, Key Feature Problems (KFP) and Isolated Questions Sequence (IQS). Each element was submitted to four LLMs (ChatGPT, Gemini Pro, Le Chat and DeepSeek) without prompt engineering. Performance was evaluated using the official correction key. We applied four binary diagnostic tags based on model agreement and self-reported ambiguity: ambiguity, low performance, incoherence and subjective ambiguity. These tags generated a composite ambiguity score and contributed to a weighted quality score for each exam component. Results: LLMs performed comparably to students, with statistically significant superior performance on the mPCC and IQS formats. IQS items had the highest ambiguity scores. Tag patterns revealed frequent issues with ambiguity and inconsistency. Quality scores varied across academic year. IQS predominantly showed moderate ambiguity (score 2), with occasional instances of strong signals. There was no significant difference in quality based on author specialty or seniority. Conclusions: LLMs can serve as objective tools to proactively detect ambiguous exam questions and estimate the overall quality of an exam. Integrating these tools into the assessment design process can reduce the need for post-exam corrections and improve fairness and clarity in medical evaluations.

  • Background: Assessment of technical aptitude, cognitive abilities and personality characteristics is important in selecting candidates for surgical training. Currently, the selection of surgical training candidates is based on ineffective methods which have been shown to be poorly correlated with later performance. Objective: The present study examined the validity of two novel assessment methods developed for this purpose: a virtual reality (VR) technical aptitude test and a game-based assessment of cognitive abilities and personality characteristics. Methods: This study focused on relationships with other variables, one of the sources of validity evidence. The study had three phases. In Phase 1, we evaluated convergent and discriminant evidence of validity by assessing the correlation between interns’ performance in the two novel selection tests and in four traditional tests for assessing dexterity, visuospatial ability, intelligence, and personality. In Phase 2, we evaluated evidence for test–criterion relationship by assessing the correlation between residents’ performance in the two selection tests and their residency performance evaluations. In this phase we also assessed evidence for the fairness of the tests across genders. In Phase 3, we evaluated evidence for relationship with training level by administering the technical aptitude test to a sample of expert surgeons and comparing their performance with that of the residents and interns from the previous phases. Results: Interns’ scores on the novel selection tests were correlated with scores on the relevant traditional tests, providing convergent and discriminant evidence (Phase 1). Residents’ scores on the novel tests were significantly correlated with relevant performance criteria (Phase 2). In addition, no evidence for gender bias in the tests was found. Finally, based on data collected in all three phases, we found evidence for expert¬–novice differences, such that the technical aptitude test scores were correlated with surgical experience. Conclusions: The findings provide validity evidence supporting use of the novel VR and gamification tests for assessment of technical aptitude, cognitive abilities and personality characteristics in selecting candidates for surgical training. The evidence suggests that use of the tests may improve the selection of surgical residents.