Search Articles

View query in Help articles search

Search Results (1 to 2 of 2 Results)

Download search results: CSV END BibTex RIS


Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China’s Rare Disease Catalog: Comparative Study

Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China’s Rare Disease Catalog: Comparative Study

The scalable Deep Seek-R1 architecture (1.5 B-671 B parameters) advances medical LLM development [22-25]. While its chain-of-thought (Co T) reasoning succeeds in general cognitive tasks, clinical validation for rare disease diagnosis , requiring specialized reasoning patterns, remains lacking. This 3-phase investigation first evaluates Chat GPT-4o’s diagnostic accuracy using clinical manifestations from China’s rare disease catalog.

Wei Zhong, YiFan Liu, Yan Liu, Kai Yang, HuiMin Gao, HuiHui Yan, WenJing Hao, YouSheng Yan, ChengHong Yin

J Med Internet Res 2025;27:e69929

Authors’ Reply: Citation Accuracy Challenges Posed by Large Language Models

Authors’ Reply: Citation Accuracy Challenges Posed by Large Language Models

Concerns over the generation of hallucinated citations by large language models (LLMs), such as Open AI’s Chat GPT, Google’s Gemini, and Hangzhou’s Deep Seek, warrant exploring advanced and novel methodologies to ensure citation accuracy and overall output integrity [3]. The LLMs have demonstrated a propensity to generate well‐formatted yet fictitious references—a limitation largely attributed to restricted access to subscription-based databases and their reliance on probabilistic text generation [4].

Mohamad-Hani Temsah, Ayman Al-Eyadhy, Amr Jamal, Khalid Alhasan, Khalid H Malki

JMIR Med Educ 2025;11:e73698