This project outlines participation in the n2c2 2022 shared task (Track 1) on Contextalized Medication Event Extraction (CMED). The track consists of three subtasks:
Here I will discuss our approach for the subtask 2 on Event classification.
The data is not publicly available due to privacy and confidentiality reasons, but can be requested from the task organizers.
The dataset contains clinical notes annotated with BRAT web annotation tool. The organizers provided the brat annotated (.ann) files and the clinical notes as text files.
Each clinical note contains multiple medications, and we have to predict an event class for each medication mention. In the pre-processing, we created multiple sequences/text chunks for each medication mention in the form [context] medication [context]. A window based approach was used to determine the length of the context. After conducting multiple experiments a window-size of 200 characters achieved the best performance. However, in some cases this window size might not be sufficient to capture the context. The code used for pre-processing the dataset can be found here.
We approached this task as a sequence classification task and used the standard BERT model for sequence classification from huggingface. The best model was an ensemble of three BERT-based models, i.e., Clinical longformer, BioClinical BERT, and PubMed BERT. Since neural network models provide different results when initialized with different seeds, previous research suggests using different seeds and averaging the results. The code for the models can be found in this Github repository.