Search Articles

View query in Help articles search

Search Results (1 to 10 of 428 Results)

Download search results: CSV END BibTex RIS


Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics

Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics

Martínez-Plumed et al [26] have already shown that item response theory can be adapted to the analysis of AI experiments, offering insights at the instance level. To mitigate the issue of data contamination, new benchmark items with predictable item parameters could easily be developed based on automatic item generation [27]. In short, we expect more instrumental roles to be played by psychometric techniques in the evaluation of GMAI.

Luning Sun, Christopher Gibbons, José Hernández-Orallo, Xiting Wang, Liming Jiang, David Stillwell, Fang Luo, Xing Xie

J Med Internet Res 2025;27:e70901