Labelling factual information in legal cases using fine-tuned BERT models

Labelling factual information on the token level in legal cases requires legal expertise and is time-consuming. This thesis proposes transfer-learning and fine-tuning implementation of pre-trained state-of-the-art BERT models to perform this labelling task. Investigations are done to compare whether...

Full description

Bibliographic Details
Main Author: Wenestam, Arvid
Format: Others
Language:English
Published: Uppsala universitet, Statistiska institutionen 2021
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447230
Description
Summary:Labelling factual information on the token level in legal cases requires legal expertise and is time-consuming. This thesis proposes transfer-learning and fine-tuning implementation of pre-trained state-of-the-art BERT models to perform this labelling task. Investigations are done to compare whether models pre-trained on solely legal corpus outperforms a generic corps trained BERT and the model’s behaviour as the number of cases in the training sample varies. This work showed that the models metric scores are stable and on par using 40-60 professionally annotated cases as opposed to using the full sample of 100 cases. Also, the generic-trained BERT model is a strong baseline, and a solely pre-trained BERT on legal corpus is not crucial for this task.