Extracting Transaction Information from Financial Press Releases
The use cases of Information Extraction (IE) are more or less endless, often consisting of a combination of Named Entity Recognition (NER) and Relation Extraction (RE). One use case of IE is the extraction of transaction information from Norwegian insider transaction Press Releases (PRs), where a tr...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Linköpings universitet, Artificiell intelligens och integrerade datorsystem
2021
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177688 |
id |
ndltd-UPSALLA1-oai-DiVA.org-liu-177688 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-liu-1776882021-07-05T05:23:09ZExtracting Transaction Information from Financial Press ReleasesengExtrahering av Transaktionsdata från Finansiella PressmeddelandenSjöberg, AgatonLinköpings universitet, Artificiell intelligens och integrerade datorsystem2021Natural Language ProcessingInformation ExtractionNamed Entity RecognitionRelation ExtractionLatent Structure RefinementFinancial Press ReleaseInsider TransactionLanguage Technology (Computational Linguistics)Språkteknologi (språkvetenskaplig databehandling)The use cases of Information Extraction (IE) are more or less endless, often consisting of a combination of Named Entity Recognition (NER) and Relation Extraction (RE). One use case of IE is the extraction of transaction information from Norwegian insider transaction Press Releases (PRs), where a transaction consists of at most four entities: the name of the owner performing the transaction, the number of shares transferred, the transaction date, and the price of the shares bought or sold. The relationships between the entities define which entity belongs to which transaction, and whether shares were bought or sold. This report has investigated how a pair of supervised NER and RE models extract this information. Since these Norwegian PRs were not labeled, two different approaches to annotating the transaction entities and their associated relations were investigated, and it was found that it is better to annotate only entities that occur in a relation than annotating all occurrences. Furthermore, the number of PRs needed to achieve a satisfactory result in the IE pipeline was investigated. The study shows that training with about 400 PRs is sufficient for the results to converge, at around 0.85 in F1-score. Finally, the report shows that there is not much difference between a complex RE model and a simple rule-based approach, when applied on the studied corpus. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177688doi:21/039application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Natural Language Processing Information Extraction Named Entity Recognition Relation Extraction Latent Structure Refinement Financial Press Release Insider Transaction Language Technology (Computational Linguistics) Språkteknologi (språkvetenskaplig databehandling) |
spellingShingle |
Natural Language Processing Information Extraction Named Entity Recognition Relation Extraction Latent Structure Refinement Financial Press Release Insider Transaction Language Technology (Computational Linguistics) Språkteknologi (språkvetenskaplig databehandling) Sjöberg, Agaton Extracting Transaction Information from Financial Press Releases |
description |
The use cases of Information Extraction (IE) are more or less endless, often consisting of a combination of Named Entity Recognition (NER) and Relation Extraction (RE). One use case of IE is the extraction of transaction information from Norwegian insider transaction Press Releases (PRs), where a transaction consists of at most four entities: the name of the owner performing the transaction, the number of shares transferred, the transaction date, and the price of the shares bought or sold. The relationships between the entities define which entity belongs to which transaction, and whether shares were bought or sold. This report has investigated how a pair of supervised NER and RE models extract this information. Since these Norwegian PRs were not labeled, two different approaches to annotating the transaction entities and their associated relations were investigated, and it was found that it is better to annotate only entities that occur in a relation than annotating all occurrences. Furthermore, the number of PRs needed to achieve a satisfactory result in the IE pipeline was investigated. The study shows that training with about 400 PRs is sufficient for the results to converge, at around 0.85 in F1-score. Finally, the report shows that there is not much difference between a complex RE model and a simple rule-based approach, when applied on the studied corpus. |
author |
Sjöberg, Agaton |
author_facet |
Sjöberg, Agaton |
author_sort |
Sjöberg, Agaton |
title |
Extracting Transaction Information from Financial Press Releases |
title_short |
Extracting Transaction Information from Financial Press Releases |
title_full |
Extracting Transaction Information from Financial Press Releases |
title_fullStr |
Extracting Transaction Information from Financial Press Releases |
title_full_unstemmed |
Extracting Transaction Information from Financial Press Releases |
title_sort |
extracting transaction information from financial press releases |
publisher |
Linköpings universitet, Artificiell intelligens och integrerade datorsystem |
publishDate |
2021 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177688 |
work_keys_str_mv |
AT sjobergagaton extractingtransactioninformationfromfinancialpressreleases AT sjobergagaton extraheringavtransaktionsdatafranfinansiellapressmeddelanden |
_version_ |
1719415641408012288 |