Оценка точности нового инструмента проверки симптомов на основе искусственного интеллекта. Клиническое исследование - Университет Карнеги-Меллона в Катаре.

admin

Август 16, 2025 - 19:25

Август 18, 2025 - 22:58

0 69

Оценка точности нового инструмента проверки симптомов на основе искусственного интеллекта. Клиническое исследование - Университет Карнеги-Меллона в Катаре.

Evaluating the Accuracy of a New Artificial Intelligence Based Symptom Checker: A Clinical Vignette Study
Mohammad Hammoud*, Shahd Douglas, Mohamad Darmach, Sara Alawneh, Swapnendu Sanyal, and Youssef Kanbour
Rimads QSTP-LLC, Qatar Science and Technology Park, Doha, Qatar

ABSTRACT
Objectives To evaluate the accuracy of a new Artificial Intelligence (AI) based symptom checker and compare it against that of some popular symptom checkers and seasoned primary care physicians.

Design Vignette study.

Setting 400 gold-standard primary care vignettes.

Intervention/Comparator We propose a 4-stage comprehensive experimentation methodology that capitalizes on the standard clinical vignette approach to evaluate 6 symptom checkers. To this end, we developed and peer-reviewed 400 vignettes, each approved by at least 5 out of 7 independent and experienced general practitioners. To establish a frame of reference and interpret the results of symptom checkers accordingly, we further compared the best-performing symptom checker against 3 primary care physicians with an average experience of 16.6 years.

Primary Outcome Measures We thoroughly studied the diagnostic accuracies of symptom checkers and physicians from 7 standard performance angles, including (a) M1 as a measure of a symptom checker’s or a physician’s ability to return a vignette’s main diagnosis at the top of their differential list, (b) F1-measure as a trade-off score between sensitivity and precision, and (c) NDCG as a measure of a differential list’s ranking quality, among others.

Results The new AI-based symptom checker, namely, Avey significantly outperformed 5 popular symptom checkers, namely, Ada, WebMD, K-Health, Buoy, and Babylon by averages of 24.5%, 175.5%, 142.8%, 159.6%, 2968.1% using M1; 8.7%, 88.9%, 66.4%, 88.9%, 2084% using F1-measure; and 21.2%, 93.4%, 113.3%, 136.4%, 3091.6% using NDCG, respectively. In contrast, physicians slightly outpaced Avey by an average of 1.2% using F1-measure, while Avey exceeded them by averages of 10.2% and 25.1% using M1 and NDCG, respectively.

Conclusions Avey demonstrated a superior performance against current symptom checkers and compared favorably to physicians.