TY - JOUR AU - Wang, Ying-Mei AU - Shen, Hung-Wei AU - Chen, Tzeng-Ji AU - Chiang, Shu-Chiung AU - Lin, Ting-Guan PY - 2025 DA - 2025/1/17 TI - Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study JO - JMIR Med Educ SP - e56850 VL - 11 KW - artificial intelligence KW - ChatGPT KW - chat generative pre-trained transformer KW - GPT-4 KW - medical education KW - educational measurement KW - pharmacy licensure KW - Taiwan KW - Taiwan national pharmacist licensing examination KW - learning model KW - AI KW - Chatbot KW - pharmacist KW - evaluation and comparison study KW - pharmacy KW - statistical analyses KW - medical databases KW - medical decision-making KW - generative AI KW - machine learning AB - Background: OpenAI released versions ChatGPT-3.5 and GPT-4 between 2022 and 2023. GPT-3.5 has demonstrated proficiency in various examinations, particularly the United States Medical Licensing Examination. However, GPT-4 has more advanced capabilities. Objective: This study aims to examine the efficacy of GPT-3.5 and GPT-4 within the Taiwan National Pharmacist Licensing Examination and to ascertain their utility and potential application in clinical pharmacy and education. Methods: The pharmacist examination in Taiwan consists of 2 stages: basic subjects and clinical subjects. In this study, exam questions were manually fed into the GPT-3.5 and GPT-4 models, and their responses were recorded; graphic-based questions were excluded. This study encompassed three steps: (1) determining the answering accuracy of GPT-3.5 and GPT-4, (2) categorizing question types and observing differences in model performance across these categories, and (3) comparing model performance on calculation and situational questions. Microsoft Excel and R software were used for statistical analyses. Results: GPT-4 achieved an accuracy rate of 72.9%, overshadowing GPT-3.5, which achieved 59.1% (P<.001). In the basic subjects category, GPT-4 significantly outperformed GPT-3.5 (73.4% vs 53.2%; P<.001). However, in clinical subjects, only minor differences in accuracy were observed. Specifically, GPT-4 outperformed GPT-3.5 in the calculation and situational questions. Conclusions: This study demonstrates that GPT-4 outperforms GPT-3.5 in the Taiwan National Pharmacist Licensing Examination, particularly in basic subjects. While GPT-4 shows potential for use in clinical practice and pharmacy education, its limitations warrant caution. Future research should focus on refining prompts, improving model stability, integrating medical databases, and designing questions that better assess student competence and minimize guessing. SN - 2369-3762 UR - https://mededu.jmir.org/2025/1/e56850 UR - https://doi.org/10.2196/56850 DO - 10.2196/56850 ID - info:doi/10.2196/56850 ER -