Labelling factual information in legal cases using fine-tuned BERT models

Labelling factual information on the token level in legal cases requires legal expertise and is time-consuming. This thesis proposes transfer-learning and fine-tuning implementation of pre-trained state-of-the-art BERT models to perform this labelling task. Investigations are done to compare whether...

Full description

Bibliographic Details
Main Author: Wenestam, Arvid
Format: Others
Language:English
Published: Uppsala universitet, Statistiska institutionen 2021
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447230
id ndltd-UPSALLA1-oai-DiVA.org-uu-447230
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-uu-4472302021-06-25T05:37:09ZLabelling factual information in legal cases using fine-tuned BERT modelsengWenestam, ArvidUppsala universitet, Statistiska institutionen2021Machine learningNatural language processingTransfer learningNeural NetworksTransformersLegal AIProbability Theory and StatisticsSannolikhetsteori och statistikLaw and SocietyJuridik och samhälleLabelling factual information on the token level in legal cases requires legal expertise and is time-consuming. This thesis proposes transfer-learning and fine-tuning implementation of pre-trained state-of-the-art BERT models to perform this labelling task. Investigations are done to compare whether models pre-trained on solely legal corpus outperforms a generic corps trained BERT and the model’s behaviour as the number of cases in the training sample varies. This work showed that the models metric scores are stable and on par using 40-60 professionally annotated cases as opposed to using the full sample of 100 cases. Also, the generic-trained BERT model is a strong baseline, and a solely pre-trained BERT on legal corpus is not crucial for this task. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447230application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Machine learning
Natural language processing
Transfer learning
Neural Networks
Transformers
Legal AI
Probability Theory and Statistics
Sannolikhetsteori och statistik
Law and Society
Juridik och samhälle
spellingShingle Machine learning
Natural language processing
Transfer learning
Neural Networks
Transformers
Legal AI
Probability Theory and Statistics
Sannolikhetsteori och statistik
Law and Society
Juridik och samhälle
Wenestam, Arvid
Labelling factual information in legal cases using fine-tuned BERT models
description Labelling factual information on the token level in legal cases requires legal expertise and is time-consuming. This thesis proposes transfer-learning and fine-tuning implementation of pre-trained state-of-the-art BERT models to perform this labelling task. Investigations are done to compare whether models pre-trained on solely legal corpus outperforms a generic corps trained BERT and the model’s behaviour as the number of cases in the training sample varies. This work showed that the models metric scores are stable and on par using 40-60 professionally annotated cases as opposed to using the full sample of 100 cases. Also, the generic-trained BERT model is a strong baseline, and a solely pre-trained BERT on legal corpus is not crucial for this task.
author Wenestam, Arvid
author_facet Wenestam, Arvid
author_sort Wenestam, Arvid
title Labelling factual information in legal cases using fine-tuned BERT models
title_short Labelling factual information in legal cases using fine-tuned BERT models
title_full Labelling factual information in legal cases using fine-tuned BERT models
title_fullStr Labelling factual information in legal cases using fine-tuned BERT models
title_full_unstemmed Labelling factual information in legal cases using fine-tuned BERT models
title_sort labelling factual information in legal cases using fine-tuned bert models
publisher Uppsala universitet, Statistiska institutionen
publishDate 2021
url http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447230
work_keys_str_mv AT wenestamarvid labellingfactualinformationinlegalcasesusingfinetunedbertmodels
_version_ 1719412801072529408