PS48 The use of artificial intelligence in dermatology systematic reviews: a comparative analysis of Elicit against human reviewers
Jui Vyas, Jeffrey Johns, Emily Forrest, Mari Ann Hilliar, Sam Salek, Andrew Y FinlayAbstract
Systematic reviews have an invaluable role in supporting evidenced-based medicine, but traditional methods are time consuming and labour intensive. Artificial intelligence (AI) tools that can be used for systematic review, such as Elicit, are rapidly evolving, but need to be assessed for accuracy, reliability and practical benefit. This study was designed to compare traditional human searching and selection of peer-reviewed journal articles for inclusion in a systematic review and data extraction, with guided searching and extraction using Elicit. A previously published systematic review of 24 articles describing the Dermatology Life Quality Index as a primary outcome in randomized clinical trials was used as the comparator. For each article, eight extraction ‘criteria’ were compared, including disease, country and participants. Elicit performed poorly in searching and identifying articles satisfying the inclusion criteria. The search was dependent on the search terms or ‘natural language’ query supplied, and the AI did not have a ‘formal understanding’ of medical terms with specific meanings. Only 13 ‘criteria’ results (6.3%) across all 24 studies were incorrectly extracted by Elicit. In 5 of 24 studies (21%), Elicit failed to correctly detect the location where studies were performed, defaulting to the author’s institution country in 2 of 24. Elicit gave incorrect information for randomization (3 of 24), blinding (3 of 24) and retention of participants (2/24), resulting in incorrect risk-of-bias scores (JADAD) in 6 of 24 cases (25%). Study disease was the only outcome with 100% correct information. One error in human extraction (number of participants) was also detected. Although using Elicit currently gives little time saving, it can fulfil the role of another ‘independent’ reviewer. Future improvements in the way the language model can be guided to extract information and a ‘better learning’ of the ‘specific meaning’ of medical terms is likely to improve the utility of AI in systematic review.