A meta-analysis is a statistical analysis that combines results of different studies that are focussed on the same disease or treatment to ascertain if a treatment is effective or not. For instance, if we want to determine if treatment X cures a certain disease/condition Y, we can (1) conduct a clinical trial, or (2) conduct a meta-analysis. Although clinical trials provide gold standard evidence, they are time consuming and expensive. Meta-analyses are therefore an alternative to clinical trials.
Meta-analysis involves extracting data from existing clinical trials reserch publications and combining the results to determine if the treatment is effective or not. However, the number of research publications is increasing rapidly and more articles are being published everyday. In this research, we propose a system that automatically extracts information from research publications. The system has the benefit of reducing the time taken in reading research articles to extract information.
We created a corpus consisting of abstracts of breast cancer randomized controlled trials (RCTs) extracted from the PubMed database. We annotate the core components of clinical trials, i.e., Participants (P), Intervention (I), Control (C), and Outcome (O), commonly known as the PICO elements. For each component we annotate text snippets that describe them.
The abstracts were annotated using BRAT web annotation tool. Fig. 2 shows a sample abstract annotated in BRAT. The annotated data can be freely downloaded here. For detailed information on the annotation guidelines and entities please read our Paper.
The information extraction is a named entity recognition (NER) task. We train three BERT-based NER models; BioBERT, BlueBERT, and Longformer. We use the standard BERT-based models for token classification as provided by huggingface. Since neural network models provide different results when initialized with different seeds, previous research suggests using different seeds and averaging the results. The code for token classification using BERT models can be found here.
Please read our paper for more details on this project.
Also check out our NER demo website and meta-analysis results visualization system.