Multilingual Zero-Shot and Few-Shot Causality Detection

Relations that hold between causes and their effects are fundamental for a wide range of different sectors. Automatically finding sentences that express such relations may for example be of great interest for the economy or political institutions. However, for many languages other than English, a la...

Full description

Bibliographic Details
Main Author: Reimann, Sebastian Michael
Format: Others
Language:English
Published: Uppsala universitet, Institutionen för lingvistik och filologi 2021
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446516
Description
Summary:Relations that hold between causes and their effects are fundamental for a wide range of different sectors. Automatically finding sentences that express such relations may for example be of great interest for the economy or political institutions. However, for many languages other than English, a lack of training resources for this task needs to be dealt with. In recent years, large, pretrained transformer-based model architectures have proven to be very effective for tasks involving cross-lingual transfer such as cross-lingual language inference, as well as multilingual named entity recognition, POS-tagging and dependency parsing, which may hint at similar potentials for causality detection. In this thesis, we define causality detection as a binary labelling problem and use cross-lingual transfer to alleviate data scarcity for German and Swedish by using three different classifiers that make either use of multilingual sentence embeddings obtained from a pretrained encoder or pretrained multilingual language models. The source languages in most of our experiments will be English, for Swedish we however also use a small German training set and a combination of English and German training data.  We try out zero-shot transfer as well as making use of limited amounts of target language data either as a development set or as additional training data in a few-shot setting. In the latter scenario, we explore the impact of varying sizes of training data. Moreover, the problem of data scarcity in our situation also makes it necessary to work with data from different annotation projects. We also explore how much this would impact our result. For German as a target language, our results in a zero-shot scenario expectedly fall short in comparison with monolingual experiments, but F1-macro scores between 60 and 65 in cases where annotation did not differ drastically still signal that it was possible to transfer at least some knowledge. When introducing only small amounts of target language data, already notable improvements were observed and with the full German training data of about 3,000 sentences combined with the most suitable English data set, the performance for German in some scenarios even almost matches the state of the art for monolingual experiments on English. The best zero-shot performance on the Swedish data was even outperforming the scores achieved for German. However, due to problems with the additional Swedish training data, we were not able to improve upon the zero-shot performance in a few-shot setting in a similar manner as it was the case for German.