Information flow identification in large email datasets

Identifying information flow in emails is an important, yet challenging task. In this work we investigate several algorithms for identifying similar sentences in large email datasets, as well as an algorithm for reconstructing threads from unstructured emails. We present a detailed evaluation of eac...

Full description

Bibliographic Details
Main Author: Akuney, Arseniy
Language:English
Published: University of British Columbia 2011
Online Access:http://hdl.handle.net/2429/39847
id ndltd-UBC-oai-circle.library.ubc.ca-2429-39847
record_format oai_dc
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-398472018-01-05T17:25:33Z Information flow identification in large email datasets Akuney, Arseniy Identifying information flow in emails is an important, yet challenging task. In this work we investigate several algorithms for identifying similar sentences in large email datasets, as well as an algorithm for reconstructing threads from unstructured emails. We present a detailed evaluation of each algorithm in terms of accuracy and time performance. We also investigate the usage of cloud computing in order to increase computational efficiency and make information discovery usable in real time. Science, Faculty of Computer Science, Department of Graduate 2011-12-23T18:13:45Z 2011-12-23T18:13:45Z 2011 2012-05 Text Thesis/Dissertation http://hdl.handle.net/2429/39847 eng Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ University of British Columbia
collection NDLTD
language English
sources NDLTD
description Identifying information flow in emails is an important, yet challenging task. In this work we investigate several algorithms for identifying similar sentences in large email datasets, as well as an algorithm for reconstructing threads from unstructured emails. We present a detailed evaluation of each algorithm in terms of accuracy and time performance. We also investigate the usage of cloud computing in order to increase computational efficiency and make information discovery usable in real time. === Science, Faculty of === Computer Science, Department of === Graduate
author Akuney, Arseniy
spellingShingle Akuney, Arseniy
Information flow identification in large email datasets
author_facet Akuney, Arseniy
author_sort Akuney, Arseniy
title Information flow identification in large email datasets
title_short Information flow identification in large email datasets
title_full Information flow identification in large email datasets
title_fullStr Information flow identification in large email datasets
title_full_unstemmed Information flow identification in large email datasets
title_sort information flow identification in large email datasets
publisher University of British Columbia
publishDate 2011
url http://hdl.handle.net/2429/39847
work_keys_str_mv AT akuneyarseniy informationflowidentificationinlargeemaildatasets
_version_ 1718583168291831808