TY - JOUR AU - Spallek, Sophia AU - Birrell, Louise AU - Kershaw, Stephanie AU - Devine, Emma Krogh AU - Thornton, Louise PY - 2023 DA - 2023/11/30 TI - Can we use ChatGPT for Mental Health and Substance Use Education? Examining Its Quality and Potential Harms JO - JMIR Med Educ SP - e51243 VL - 9 KW - artificial intelligence KW - generative artificial intelligence KW - large language models KW - ChatGPT KW - medical education KW - health education KW - patient education handout KW - preventive health services KW - educational intervention KW - mental health KW - substance use AB - Background: The use of generative artificial intelligence, more specifically large language models (LLMs), is proliferating, and as such, it is vital to consider both the value and potential harms of its use in medical education. Their efficiency in a variety of writing styles makes LLMs, such as ChatGPT, attractive for tailoring educational materials. However, this technology can feature biases and misinformation, which can be particularly harmful in medical education settings, such as mental health and substance use education. This viewpoint investigates if ChatGPT is sufficient for 2 common health education functions in the field of mental health and substance use: (1) answering users’ direct queries and (2) aiding in the development of quality consumer educational health materials. Objective: This viewpoint includes a case study to provide insight into the accessibility, biases, and quality of ChatGPT’s query responses and educational health materials. We aim to provide guidance for the general public and health educators wishing to utilize LLMs. Methods: We collected real world queries from 2 large-scale mental health and substance use portals and engineered a variety of prompts to use on GPT-4 Pro with the Bing BETA internet browsing plug-in. The outputs were evaluated with tools from the Sydney Health Literacy Lab to determine the accessibility, the adherence to Mindframe communication guidelines to identify biases, and author assessments on quality, including tailoring to audiences, duty of care disclaimers, and evidence-based internet references. Results: GPT-4’s outputs had good face validity, but upon detailed analysis were substandard in comparison to expert-developed materials. Without engineered prompting, the reading level, adherence to communication guidelines, and use of evidence-based websites were poor. Therefore, all outputs still required cautious human editing and oversight. Conclusions: GPT-4 is currently not reliable enough for direct-consumer queries, but educators and researchers can use it for creating educational materials with caution. Materials created with LLMs should disclose the use of generative artificial intelligence and be evaluated on their efficacy with the target audience. SN - 2369-3762 UR - https://mededu.jmir.org/2023/1/e51243 UR - https://doi.org/10.2196/51243 UR - http://www.ncbi.nlm.nih.gov/pubmed/38032714 DO - 10.2196/51243 ID - info:doi/10.2196/51243 ER -