Labelling factual information in legal cases using fine-tuned BERT models
Labelling factual information on the token level in legal cases requires legal expertise and is time-consuming. This thesis proposes transfer-learning and fine-tuning implementation of pre-trained state-of-the-art BERT models to perform this labelling task. Investigations are done to compare whether...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Uppsala universitet, Statistiska institutionen
2021
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447230 |
id |
ndltd-UPSALLA1-oai-DiVA.org-uu-447230 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-uu-4472302021-06-25T05:37:09ZLabelling factual information in legal cases using fine-tuned BERT modelsengWenestam, ArvidUppsala universitet, Statistiska institutionen2021Machine learningNatural language processingTransfer learningNeural NetworksTransformersLegal AIProbability Theory and StatisticsSannolikhetsteori och statistikLaw and SocietyJuridik och samhälleLabelling factual information on the token level in legal cases requires legal expertise and is time-consuming. This thesis proposes transfer-learning and fine-tuning implementation of pre-trained state-of-the-art BERT models to perform this labelling task. Investigations are done to compare whether models pre-trained on solely legal corpus outperforms a generic corps trained BERT and the model’s behaviour as the number of cases in the training sample varies. This work showed that the models metric scores are stable and on par using 40-60 professionally annotated cases as opposed to using the full sample of 100 cases. Also, the generic-trained BERT model is a strong baseline, and a solely pre-trained BERT on legal corpus is not crucial for this task. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447230application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Machine learning Natural language processing Transfer learning Neural Networks Transformers Legal AI Probability Theory and Statistics Sannolikhetsteori och statistik Law and Society Juridik och samhälle |
spellingShingle |
Machine learning Natural language processing Transfer learning Neural Networks Transformers Legal AI Probability Theory and Statistics Sannolikhetsteori och statistik Law and Society Juridik och samhälle Wenestam, Arvid Labelling factual information in legal cases using fine-tuned BERT models |
description |
Labelling factual information on the token level in legal cases requires legal expertise and is time-consuming. This thesis proposes transfer-learning and fine-tuning implementation of pre-trained state-of-the-art BERT models to perform this labelling task. Investigations are done to compare whether models pre-trained on solely legal corpus outperforms a generic corps trained BERT and the model’s behaviour as the number of cases in the training sample varies. This work showed that the models metric scores are stable and on par using 40-60 professionally annotated cases as opposed to using the full sample of 100 cases. Also, the generic-trained BERT model is a strong baseline, and a solely pre-trained BERT on legal corpus is not crucial for this task. |
author |
Wenestam, Arvid |
author_facet |
Wenestam, Arvid |
author_sort |
Wenestam, Arvid |
title |
Labelling factual information in legal cases using fine-tuned BERT models |
title_short |
Labelling factual information in legal cases using fine-tuned BERT models |
title_full |
Labelling factual information in legal cases using fine-tuned BERT models |
title_fullStr |
Labelling factual information in legal cases using fine-tuned BERT models |
title_full_unstemmed |
Labelling factual information in legal cases using fine-tuned BERT models |
title_sort |
labelling factual information in legal cases using fine-tuned bert models |
publisher |
Uppsala universitet, Statistiska institutionen |
publishDate |
2021 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447230 |
work_keys_str_mv |
AT wenestamarvid labellingfactualinformationinlegalcasesusingfinetunedbertmodels |
_version_ |
1719412801072529408 |