Extracting Transaction Information from Financial Press Releases

The use cases of Information Extraction (IE) are more or less endless, often consisting of a combination of Named Entity Recognition (NER) and Relation Extraction (RE). One use case of IE is the extraction of transaction information from Norwegian insider transaction Press Releases (PRs), where a tr...

Full description

Bibliographic Details
Main Author: Sjöberg, Agaton
Format: Others
Language:English
Published: Linköpings universitet, Artificiell intelligens och integrerade datorsystem 2021
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177688
id ndltd-UPSALLA1-oai-DiVA.org-liu-177688
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-liu-1776882021-07-05T05:23:09ZExtracting Transaction Information from Financial Press ReleasesengExtrahering av Transaktionsdata från Finansiella PressmeddelandenSjöberg, AgatonLinköpings universitet, Artificiell intelligens och integrerade datorsystem2021Natural Language ProcessingInformation ExtractionNamed Entity RecognitionRelation ExtractionLatent Structure RefinementFinancial Press ReleaseInsider TransactionLanguage Technology (Computational Linguistics)Språkteknologi (språkvetenskaplig databehandling)The use cases of Information Extraction (IE) are more or less endless, often consisting of a combination of Named Entity Recognition (NER) and Relation Extraction (RE). One use case of IE is the extraction of transaction information from Norwegian insider transaction Press Releases (PRs), where a transaction consists of at most four entities: the name of the owner performing the transaction, the number of shares transferred, the transaction date, and the price of the shares bought or sold. The relationships between the entities define which entity belongs to which transaction, and whether shares were bought or sold. This report has investigated how a pair of supervised NER and RE models extract this information. Since these Norwegian PRs were not labeled, two different approaches to annotating the transaction entities and their associated relations were investigated, and it was found that it is better to annotate only entities that occur in a relation than annotating all occurrences. Furthermore, the number of PRs needed to achieve a satisfactory result in the IE pipeline was investigated. The study shows that training with about 400 PRs is sufficient for the results to converge, at around 0.85 in F1-score. Finally, the report shows that there is not much difference between a complex RE model and a simple rule-based approach, when applied on the studied corpus. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177688doi:21/039application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Natural Language Processing
Information Extraction
Named Entity Recognition
Relation Extraction
Latent Structure Refinement
Financial Press Release
Insider Transaction
Language Technology (Computational Linguistics)
Språkteknologi (språkvetenskaplig databehandling)
spellingShingle Natural Language Processing
Information Extraction
Named Entity Recognition
Relation Extraction
Latent Structure Refinement
Financial Press Release
Insider Transaction
Language Technology (Computational Linguistics)
Språkteknologi (språkvetenskaplig databehandling)
Sjöberg, Agaton
Extracting Transaction Information from Financial Press Releases
description The use cases of Information Extraction (IE) are more or less endless, often consisting of a combination of Named Entity Recognition (NER) and Relation Extraction (RE). One use case of IE is the extraction of transaction information from Norwegian insider transaction Press Releases (PRs), where a transaction consists of at most four entities: the name of the owner performing the transaction, the number of shares transferred, the transaction date, and the price of the shares bought or sold. The relationships between the entities define which entity belongs to which transaction, and whether shares were bought or sold. This report has investigated how a pair of supervised NER and RE models extract this information. Since these Norwegian PRs were not labeled, two different approaches to annotating the transaction entities and their associated relations were investigated, and it was found that it is better to annotate only entities that occur in a relation than annotating all occurrences. Furthermore, the number of PRs needed to achieve a satisfactory result in the IE pipeline was investigated. The study shows that training with about 400 PRs is sufficient for the results to converge, at around 0.85 in F1-score. Finally, the report shows that there is not much difference between a complex RE model and a simple rule-based approach, when applied on the studied corpus.
author Sjöberg, Agaton
author_facet Sjöberg, Agaton
author_sort Sjöberg, Agaton
title Extracting Transaction Information from Financial Press Releases
title_short Extracting Transaction Information from Financial Press Releases
title_full Extracting Transaction Information from Financial Press Releases
title_fullStr Extracting Transaction Information from Financial Press Releases
title_full_unstemmed Extracting Transaction Information from Financial Press Releases
title_sort extracting transaction information from financial press releases
publisher Linköpings universitet, Artificiell intelligens och integrerade datorsystem
publishDate 2021
url http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177688
work_keys_str_mv AT sjobergagaton extractingtransactioninformationfromfinancialpressreleases
AT sjobergagaton extraheringavtransaktionsdatafranfinansiellapressmeddelanden
_version_ 1719415641408012288