AI-enabled systematic review platforms: how well do they perform?

Posted on March 3, 2026 by UHMLG

Charis Wong, Honorary Senior Clinical Lecturer, Centre for Clinical Brain Sciences, University of Edinburgh and Consultant Neurologist, Queen Margaret Hospital

Abstract

Objective

Traditional systematic reviews are time and resource intensive, limiting their application. There has been a recent emergence of AI-enabled systematic review platforms promising faster and more comprehensive reviews. However, independent validation of these platforms is needed to assess their suitability and potential to augment or replace human reviewer efforts.

Methods

We evaluated the performance of three platforms – Elicit.AI, Nested Knowledge and Scispace – across different systematic review stages. We seek to replicate three recently published systematic reviewsand compare the performance of each platform against original published findings. We pre-registered our protocol on OSF (doi:10.17605/OSF.IO/8NF6H).

Results

As of October 2025, we completed citation screening across all platforms for the first review (https://doi.org/10.12688/wellcomeopenres.21302.1). The platforms vary in usage and reporting of AI methods, degrees of automation and required human supervision, transparency, customisability, and data sources. Retrieval limits were 500 studies per review for Elicit and 10000 for Nested Knowledge.

Elicit.AI uses LLMs to screen citations against criteria defined by users. Nested Knowledge offers greater customisation but requires more human input and oversight. Of the 15 studies included in the original review, Elicit and Scispace both included 6 studies (sensitivity 40%). We identified all 15 studies using Nested Knowledge with a dual-pass, dual-reviewer set up; we enabled their AI-based Robot Screener for abstract screening after 477 human dual-screened decisions (11 advanced) on our corpus of 1889 citations. Robot Screener yielded a sensitivity of 72.7% and precision of 42.1%.

We terminated our evaluation of Scispace on the basis of futility. The data files and systematic review report exported were corrupted hindering evaluation. Scispace reported retrieving 121 studies, 65 of which unique, but there were five duplicate citations within that set.

Conclusion

Preliminary findings highlight significant variability in the performance of AI-enabled systematic review platforms. Low sensitivity and technical limitations in some platforms may compromise reliability. Further analyses of later review stages and replication of two additional systematic reviews using Elicit.AI and Nested Knowledge are underway and expected to be completed by Q1 2026.

https://www.linkedin.com/in/charis-wong-bb4686192/

Biography

Charis is a consultant neurologist and Honorary Senior Clinical Lecturer at University of Edinburgh with an interest in how to improve the use of evidence in informing research decisions, such as how drugs are selected for clinical trials and further evaluation. She developed methods to synthesise and report different types of data to inform drug selection for MND-SMART, a clinical trial in motor neuron disease with the CAMARADES group and the Anne Rowling Clinic in Edinburgh for her PhD. She is working with others to expand this to other neurological conditions such as Alzheimer’s disease, progressive multiple sclerosis, vascular causes of dementia, and intracerebral haemorrhage. She is interested in evaluating Digital Evidence Synthesis Tools and the use of AI for various systematic review tasks.

Spring Forum 2026: Search Lightning! This review is automatic, systematic, hydromatic!

The 2026 UHMLG Spring Forum will once again bring together a fantasic line-up of speakers from around the UK, across two half-days in April (22nd & 23rd). There are a total of six talks on the overall theme of systematic reviews – some of which do, of course, discuss the use of AI in systematic reviews… but the focus is very much on the practical implementation, not on generalities or background.

Our core audience is UK / Republic of Ireland health and medical librarians from the Higher Education and NHS / health sectors, but we welcome delegates from any area of librarianship, and from anywhere in the world.

More information / Book your place

For full information about the forum and to book a ticket place, please visit our Spring Forum 2026 page. This year we are offering an institutional ticket, which allows unlimited access for all colleagues.

AI-enabled systematic review platforms: how well do they perform?

Charis Wong, Honorary Senior Clinical Lecturer, Centre for Clinical Brain Sciences, University of Edinburgh and Consultant Neurologist, Queen Margaret Hospital

Abstract

Biography

Spring Forum 2026: Search Lightning! This review is automatic, systematic, hydromatic!

More information / Book your place

Recent Posts

Archives

Categories

Meta

Charis Wong, Honorary Senior Clinical Lecturer, Centre for Clinical Brain Sciences, University of Edinburgh and Consultant Neurologist, Queen Margaret Hospital

Abstract

Biography

Spring Forum 2026: Search Lightning! This review is automatic, systematic, hydromatic!

More information / Book your place

Share this:

Recent Posts

Archives

Categories

Meta