Interactive Evidence Detection

Without evidence, research would be nearly impossible. Whereas a conjecture in mathematics must be proven logically until it is accepted as theorem, most research disciplines depend on empirical evidence. In the natural sciences, researchers conduct experiments to create the most objective evidenc...

Full description

Bibliographic Details
Main Author: Stahlhut, Chris
Format: Others
Language:en
Published: 2021
Online Access:https://tuprints.ulb.tu-darmstadt.de/19154/1/InteractiveEvidenceDetection.pdf
Stahlhut, Chris <http://tuprints.ulb.tu-darmstadt.de/view/person/Stahlhut=3AChris=3A=3A.html> (2021):Interactive Evidence Detection. (Publisher's Version)Darmstadt, Technische Universität, DOI: 10.26083/tuprints-00019154 <https://doi.org/10.26083/tuprints-00019154>, [Ph.D. Thesis]
id ndltd-tu-darmstadt.de-oai-tuprints.ulb.tu-darmstadt.de-19154
record_format oai_dc
collection NDLTD
language en
format Others
sources NDLTD
description Without evidence, research would be nearly impossible. Whereas a conjecture in mathematics must be proven logically until it is accepted as theorem, most research disciplines depend on empirical evidence. In the natural sciences, researchers conduct experiments to create the most objective evidence to evaluate hypotheses. In the humanities and social sciences, most evidence is extracted from textual sources. These can be news articles spanning several decades or transcribed interviews. However, not every document or interview contains statements that support or contradict a hypothesis, causing a time-intensive search that might be sped up with modern natural language processing techniques. Finding evidence ---or evidence detection--- is a fast growing field that is currently gaining relevance because of the increased focus on detecting fake news. Some research focusses not only on evidence detection, but also on linking evidence to hypotheses, or evidence linking. Other work aims at speeding up the decision processes regarding whether a hypothesis is valid or not. Yet another focus of research in evidence detection aims at finding evidence in medical abstracts. Although these approaches are promising, their applicability to research in the humanities and social sciences has not yet been evaluated. Most evidence detection and evidence linking models are also static in nature. Usually, we first create a large dataset in which text snippets are labelled as evidence. This dataset is then used to train and evaluate different models that do not change after their initial training. Furthermore, most work assumes that all users interpret evidence in a similar way so that a single evidence detection or evidence linking model can be used by all users. This PhD project aims at evaluating whether modern natural language processing techniques can be used to support researchers in the humanities and social sciences in finding evidence so that they can evaluate their hypotheses. We first investigated how real users search for evidence and link this to self-defined hypotheses. We found that there is no canonical user; some users define hypotheses first and then search for evidence. Others search for evidence first and then define hypotheses. We also found that the interpretation of evidence varies between different users. Similar hypotheses are supported by different pieces of evidence, and the same evidence can be used to support different hypotheses. This means that any evidence detection model must be specific to a single user. User-specific evidence detection models require a large amount of data, which is labour-intensive to create. Therefore, we investigate how much data is necessary until an interactively trained evidence detection model outperforms a well generalising, state-of-the-art model. In our evaluation, we found that an evidence detection model, which had first been trained on external data and then been fine-tuned interactively, requires only a few training documents to yield better results than a state-of-the-art model trained only on the external data. Regarding the practical benefit of this research, we built an annotation or coding tool allowing users to label sentences as evidence and link these pieces of evidence to self-defined hypotheses. We evaluated this tool, named EDoHa (Evidence Detection fOr Hypothesis vAlidation), in a user study with a group of students and one with colleagues from the research training group KRITIS. EDoHa and the data to pre-train evidence detection and evidence linking models are published under an open source licence so that researchers outside the research training group can also benefit from it. This project contributes not only to evidence detection and natural language processing, but also to research methodologies in qualitative text-based research.
author Stahlhut, Chris
spellingShingle Stahlhut, Chris
Interactive Evidence Detection
author_facet Stahlhut, Chris
author_sort Stahlhut, Chris
title Interactive Evidence Detection
title_short Interactive Evidence Detection
title_full Interactive Evidence Detection
title_fullStr Interactive Evidence Detection
title_full_unstemmed Interactive Evidence Detection
title_sort interactive evidence detection
publishDate 2021
url https://tuprints.ulb.tu-darmstadt.de/19154/1/InteractiveEvidenceDetection.pdf
Stahlhut, Chris <http://tuprints.ulb.tu-darmstadt.de/view/person/Stahlhut=3AChris=3A=3A.html> (2021):Interactive Evidence Detection. (Publisher's Version)Darmstadt, Technische Universität, DOI: 10.26083/tuprints-00019154 <https://doi.org/10.26083/tuprints-00019154>, [Ph.D. Thesis]
work_keys_str_mv AT stahlhutchris interactiveevidencedetection
_version_ 1719417138170560512
spelling ndltd-tu-darmstadt.de-oai-tuprints.ulb.tu-darmstadt.de-191542021-07-17T05:14:24Z http://tuprints.ulb.tu-darmstadt.de/19154/ Interactive Evidence Detection Stahlhut, Chris Without evidence, research would be nearly impossible. Whereas a conjecture in mathematics must be proven logically until it is accepted as theorem, most research disciplines depend on empirical evidence. In the natural sciences, researchers conduct experiments to create the most objective evidence to evaluate hypotheses. In the humanities and social sciences, most evidence is extracted from textual sources. These can be news articles spanning several decades or transcribed interviews. However, not every document or interview contains statements that support or contradict a hypothesis, causing a time-intensive search that might be sped up with modern natural language processing techniques. Finding evidence ---or evidence detection--- is a fast growing field that is currently gaining relevance because of the increased focus on detecting fake news. Some research focusses not only on evidence detection, but also on linking evidence to hypotheses, or evidence linking. Other work aims at speeding up the decision processes regarding whether a hypothesis is valid or not. Yet another focus of research in evidence detection aims at finding evidence in medical abstracts. Although these approaches are promising, their applicability to research in the humanities and social sciences has not yet been evaluated. Most evidence detection and evidence linking models are also static in nature. Usually, we first create a large dataset in which text snippets are labelled as evidence. This dataset is then used to train and evaluate different models that do not change after their initial training. Furthermore, most work assumes that all users interpret evidence in a similar way so that a single evidence detection or evidence linking model can be used by all users. This PhD project aims at evaluating whether modern natural language processing techniques can be used to support researchers in the humanities and social sciences in finding evidence so that they can evaluate their hypotheses. We first investigated how real users search for evidence and link this to self-defined hypotheses. We found that there is no canonical user; some users define hypotheses first and then search for evidence. Others search for evidence first and then define hypotheses. We also found that the interpretation of evidence varies between different users. Similar hypotheses are supported by different pieces of evidence, and the same evidence can be used to support different hypotheses. This means that any evidence detection model must be specific to a single user. User-specific evidence detection models require a large amount of data, which is labour-intensive to create. Therefore, we investigate how much data is necessary until an interactively trained evidence detection model outperforms a well generalising, state-of-the-art model. In our evaluation, we found that an evidence detection model, which had first been trained on external data and then been fine-tuned interactively, requires only a few training documents to yield better results than a state-of-the-art model trained only on the external data. Regarding the practical benefit of this research, we built an annotation or coding tool allowing users to label sentences as evidence and link these pieces of evidence to self-defined hypotheses. We evaluated this tool, named EDoHa (Evidence Detection fOr Hypothesis vAlidation), in a user study with a group of students and one with colleagues from the research training group KRITIS. EDoHa and the data to pre-train evidence detection and evidence linking models are published under an open source licence so that researchers outside the research training group can also benefit from it. This project contributes not only to evidence detection and natural language processing, but also to research methodologies in qualitative text-based research. 2021 Ph.D. Thesis NonPeerReviewed text CC BY 4.0 International - Creative Commons, Attribution https://tuprints.ulb.tu-darmstadt.de/19154/1/InteractiveEvidenceDetection.pdf Stahlhut, Chris <http://tuprints.ulb.tu-darmstadt.de/view/person/Stahlhut=3AChris=3A=3A.html> (2021):Interactive Evidence Detection. (Publisher's Version)Darmstadt, Technische Universität, DOI: 10.26083/tuprints-00019154 <https://doi.org/10.26083/tuprints-00019154>, [Ph.D. Thesis] https://doi.org/10.26083/tuprints-00019154 en info:eu-repo/semantics/doctoralThesis info:eu-repo/semantics/openAccess