Interactive Evidence Detection

Without evidence, research would be nearly impossible. Whereas a conjecture in mathematics must be proven logically until it is accepted as theorem, most research disciplines depend on empirical evidence. In the natural sciences, researchers conduct experiments to create the most objective evidenc...

Full description

Bibliographic Details
Main Author:	Stahlhut, Chris
Format:	Others
Language:	en
Published:	2021
Online Access:	https://tuprints.ulb.tu-darmstadt.de/19154/1/InteractiveEvidenceDetection.pdf Stahlhut, Chris <http://tuprints.ulb.tu-darmstadt.de/view/person/Stahlhut=3AChris=3A=3A.html> (2021):Interactive Evidence Detection. (Publisher's Version)Darmstadt, Technische Universität, DOI: 10.26083/tuprints-00019154 <https://doi.org/10.26083/tuprints-00019154>, [Ph.D. Thesis]

id	ndltd-tu-darmstadt.de-oai-tuprints.ulb.tu-darmstadt.de-19154
record_format	oai_dc
collection	NDLTD
language	en
format	Others
sources	NDLTD
description	Without evidence, research would be nearly impossible. Whereas a conjecture in mathematics must be proven logically until it is accepted as theorem, most research disciplines depend on empirical evidence. In the natural sciences, researchers conduct experiments to create the most objective evidence to evaluate hypotheses. In the humanities and social sciences, most evidence is extracted from textual sources. These can be news articles spanning several decades or transcribed interviews. However, not every document or interview contains statements that support or contradict a hypothesis, causing a time-intensive search that might be sped up with modern natural language processing techniques. Finding evidence ---or evidence detection--- is a fast growing field that is currently gaining relevance because of the increased focus on detecting fake news. Some research focusses not only on evidence detection, but also on linking evidence to hypotheses, or evidence linking. Other work aims at speeding up the decision processes regarding whether a hypothesis is valid or not. Yet another focus of research in evidence detection aims at finding evidence in medical abstracts. Although these approaches are promising, their applicability to research in the humanities and social sciences has not yet been evaluated. Most evidence detection and evidence linking models are also static in nature. Usually, we first create a large dataset in which text snippets are labelled as evidence. This dataset is then used to train and evaluate different models that do not change after their initial training. Furthermore, most work assumes that all users interpret evidence in a similar way so that a single evidence detection or evidence linking model can be used by all users. This PhD project aims at evaluating whether modern natural language processing techniques can be used to support researchers in the humanities and social sciences in finding evidence so that they can evaluate their hypotheses. We first investigated how real users search for evidence and link this to self-defined hypotheses. We found that there is no canonical user; some users define hypotheses first and then search for evidence. Others search for evidence first and then define hypotheses. We also found that the interpretation of evidence varies between different users. Similar hypotheses are supported by different pieces of evidence, and the same evidence can be used to support different hypotheses. This means that any evidence detection model must be specific to a single user. User-specific evidence detection models require a large amount of data, which is labour-intensive to create. Therefore, we investigate how much data is necessary until an interactively trained evidence detection model outperforms a well generalising, state-of-the-art model. In our evaluation, we found that an evidence detection model, which had first been trained on external data and then been fine-tuned interactively, requires only a few training documents to yield better results than a state-of-the-art model trained only on the external data. Regarding the practical benefit of this research, we built an annotation or coding tool allowing users to label sentences as evidence and link these pieces of evidence to self-defined hypotheses. We evaluated this tool, named EDoHa (Evidence Detection fOr Hypothesis vAlidation), in a user study with a group of students and one with colleagues from the research training group KRITIS. EDoHa and the data to pre-train evidence detection and evidence linking models are published under an open source licence so that researchers outside the research training group can also benefit from it. This project contributes not only to evidence detection and natural language processing, but also to research methodologies in qualitative text-based research.
author	Stahlhut, Chris
spellingShingle	Stahlhut, Chris Interactive Evidence Detection
author_facet	Stahlhut, Chris
author_sort	Stahlhut, Chris
title	Interactive Evidence Detection
title_short	Interactive Evidence Detection
title_full	Interactive Evidence Detection
title_fullStr	Interactive Evidence Detection
title_full_unstemmed	Interactive Evidence Detection
title_sort	interactive evidence detection
publishDate	2021
url	https://tuprints.ulb.tu-darmstadt.de/19154/1/InteractiveEvidenceDetection.pdf Stahlhut, Chris <http://tuprints.ulb.tu-darmstadt.de/view/person/Stahlhut=3AChris=3A=3A.html> (2021):Interactive Evidence Detection. (Publisher's Version)Darmstadt, Technische Universität, DOI: 10.26083/tuprints-00019154 <https://doi.org/10.26083/tuprints-00019154>, [Ph.D. Thesis]
work_keys_str_mv	AT stahlhutchris interactiveevidencedetection
_version_	1719417138170560512
spelling	ndltd-tu-darmstadt.de-oai-tuprints.ulb.tu-darmstadt.de-191542021-07-17T05:14:24Z http://tuprints.ulb.tu-darmstadt.de/19154/ Interactive Evidence Detection Stahlhut, Chris Without evidence, research would be nearly impossible. Whereas a conjecture in mathematics must be proven logically until it is accepted as theorem, most research disciplines depend on empirical evidence. In the natural sciences, researchers conduct experiments to create the most objective evidence to evaluate hypotheses. In the humanities and social sciences, most evidence is extracted from textual sources. These can be news articles spanning several decades or transcribed interviews. However, not every document or interview contains statements that support or contradict a hypothesis, causing a time-intensive search that might be sped up with modern natural language processing techniques. Finding evidence ---or evidence detection--- is a fast growing field that is currently gaining relevance because of the increased focus on detecting fake news. Some research focusses not only on evidence detection, but also on linking evidence to hypotheses, or evidence linking. Other work aims at speeding up the decision processes regarding whether a hypothesis is valid or not. Yet another focus of research in evidence detection aims at finding evidence in medical abstracts. Although these approaches are promising, their applicability to research in the humanities and social sciences has not yet been evaluated. Most evidence detection and evidence linking models are also static in nature. Usually, we first create a large dataset in which text snippets are labelled as evidence. This dataset is then used to train and evaluate different models that do not change after their initial training. Furthermore, most work assumes that all users interpret evidence in a similar way so that a single evidence detection or evidence linking model can be used by all users. This PhD project aims at evaluating whether modern natural language processing techniques can be used to support researchers in the humanities and social sciences in finding evidence so that they can evaluate their hypotheses. We first investigated how real users search for evidence and link this to self-defined hypotheses. We found that there is no canonical user; some users define hypotheses first and then search for evidence. Others search for evidence first and then define hypotheses. We also found that the interpretation of evidence varies between different users. Similar hypotheses are supported by different pieces of evidence, and the same evidence can be used to support different hypotheses. This means that any evidence detection model must be specific to a single user. User-specific evidence detection models require a large amount of data, which is labour-intensive to create. Therefore, we investigate how much data is necessary until an interactively trained evidence detection model outperforms a well generalising, state-of-the-art model. In our evaluation, we found that an evidence detection model, which had first been trained on external data and then been fine-tuned interactively, requires only a few training documents to yield better results than a state-of-the-art model trained only on the external data. Regarding the practical benefit of this research, we built an annotation or coding tool allowing users to label sentences as evidence and link these pieces of evidence to self-defined hypotheses. We evaluated this tool, named EDoHa (Evidence Detection fOr Hypothesis vAlidation), in a user study with a group of students and one with colleagues from the research training group KRITIS. EDoHa and the data to pre-train evidence detection and evidence linking models are published under an open source licence so that researchers outside the research training group can also benefit from it. This project contributes not only to evidence detection and natural language processing, but also to research methodologies in qualitative text-based research. 2021 Ph.D. Thesis NonPeerReviewed text CC BY 4.0 International - Creative Commons, Attribution https://tuprints.ulb.tu-darmstadt.de/19154/1/InteractiveEvidenceDetection.pdf Stahlhut, Chris <http://tuprints.ulb.tu-darmstadt.de/view/person/Stahlhut=3AChris=3A=3A.html> (2021):Interactive Evidence Detection. (Publisher's Version)Darmstadt, Technische Universität, DOI: 10.26083/tuprints-00019154 <https://doi.org/10.26083/tuprints-00019154>, [Ph.D. Thesis] https://doi.org/10.26083/tuprints-00019154 en info:eu-repo/semantics/doctoralThesis info:eu-repo/semantics/openAccess

Interactive Evidence Detection

Similar Items