Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification

Approved for public release; distribution is unlimited. === We determined that it is possible to achieve authorship attribution in the e-mail domain when training on "ersonal" e-mails and testing on "work" e-mails and vice versa. These results are unique since they simulate two d...

Full description

Bibliographic Details
Main Author: Levy-Minzie, Kori.
Other Authors: Martell, Craig
Published: Monterey, California. Naval Postgraduate School 2012
Online Access:http://hdl.handle.net/10945/5780
id ndltd-nps.edu-oai-calhoun.nps.edu-10945-5780
record_format oai_dc
spelling ndltd-nps.edu-oai-calhoun.nps.edu-10945-57802015-08-06T16:02:44Z Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification Levy-Minzie, Kori. Martell, Craig Young, Joel Naval Postgraduate School (U.S.). Computer Science Approved for public release; distribution is unlimited. We determined that it is possible to achieve authorship attribution in the e-mail domain when training on "ersonal" e-mails and testing on "work" e-mails and vice versa. These results are unique since they simulate two different e-mail addresses belonging to the same person where the topic of the e-mails from the two different addresses do not intersect. As we only used one classification technique, these results are preliminary and may serve as a baseline for future work in this area. The corpus of data was the entirety of the Enron corpus as well as a subsection of hand-annotated work and personal e-mails. We discovered that there is enough author signal in each class to identify an author in a sea of noise. We included suggestions for future work in the areas of expanding feature selection, increasing corpus size, and including more classification methods. Advancement in this area will contribute to increasing cyber security by identifying the senders of anonymous derogatory e-mails and reducing cyber bullying. 2012-03-14T17:46:42Z 2012-03-14T17:46:42Z 2011-03 Thesis http://hdl.handle.net/10945/5780 720351608 This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. As such, it is in the public domain, and under the provisions of Title 17, United States Code, Section 105, it may not be copyrighted. Monterey, California. Naval Postgraduate School
collection NDLTD
sources NDLTD
description Approved for public release; distribution is unlimited. === We determined that it is possible to achieve authorship attribution in the e-mail domain when training on "ersonal" e-mails and testing on "work" e-mails and vice versa. These results are unique since they simulate two different e-mail addresses belonging to the same person where the topic of the e-mails from the two different addresses do not intersect. As we only used one classification technique, these results are preliminary and may serve as a baseline for future work in this area. The corpus of data was the entirety of the Enron corpus as well as a subsection of hand-annotated work and personal e-mails. We discovered that there is enough author signal in each class to identify an author in a sea of noise. We included suggestions for future work in the areas of expanding feature selection, increasing corpus size, and including more classification methods. Advancement in this area will contribute to increasing cyber security by identifying the senders of anonymous derogatory e-mails and reducing cyber bullying.
author2 Martell, Craig
author_facet Martell, Craig
Levy-Minzie, Kori.
author Levy-Minzie, Kori.
spellingShingle Levy-Minzie, Kori.
Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification
author_sort Levy-Minzie, Kori.
title Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification
title_short Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification
title_full Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification
title_fullStr Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification
title_full_unstemmed Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification
title_sort authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification
publisher Monterey, California. Naval Postgraduate School
publishDate 2012
url http://hdl.handle.net/10945/5780
work_keys_str_mv AT levyminziekori authorshipattributionintheemaildomainastudyoftheeffectofsizeofauthorcorpusandtopiconaccuracyofidentification
_version_ 1716816465328340992