Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification
Approved for public release; distribution is unlimited. === We determined that it is possible to achieve authorship attribution in the e-mail domain when training on "ersonal" e-mails and testing on "work" e-mails and vice versa. These results are unique since they simulate two d...
Main Author: | |
---|---|
Other Authors: | |
Published: |
Monterey, California. Naval Postgraduate School
2012
|
Online Access: | http://hdl.handle.net/10945/5780 |
id |
ndltd-nps.edu-oai-calhoun.nps.edu-10945-5780 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-nps.edu-oai-calhoun.nps.edu-10945-57802015-08-06T16:02:44Z Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification Levy-Minzie, Kori. Martell, Craig Young, Joel Naval Postgraduate School (U.S.). Computer Science Approved for public release; distribution is unlimited. We determined that it is possible to achieve authorship attribution in the e-mail domain when training on "ersonal" e-mails and testing on "work" e-mails and vice versa. These results are unique since they simulate two different e-mail addresses belonging to the same person where the topic of the e-mails from the two different addresses do not intersect. As we only used one classification technique, these results are preliminary and may serve as a baseline for future work in this area. The corpus of data was the entirety of the Enron corpus as well as a subsection of hand-annotated work and personal e-mails. We discovered that there is enough author signal in each class to identify an author in a sea of noise. We included suggestions for future work in the areas of expanding feature selection, increasing corpus size, and including more classification methods. Advancement in this area will contribute to increasing cyber security by identifying the senders of anonymous derogatory e-mails and reducing cyber bullying. 2012-03-14T17:46:42Z 2012-03-14T17:46:42Z 2011-03 Thesis http://hdl.handle.net/10945/5780 720351608 This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. As such, it is in the public domain, and under the provisions of Title 17, United States Code, Section 105, it may not be copyrighted. Monterey, California. Naval Postgraduate School |
collection |
NDLTD |
sources |
NDLTD |
description |
Approved for public release; distribution is unlimited. === We determined that it is possible to achieve authorship attribution in the e-mail domain when training on "ersonal" e-mails and testing on "work" e-mails and vice versa. These results are unique since they simulate two different e-mail addresses belonging to the same person where the topic of the e-mails from the two different addresses do not intersect. As we only used one classification technique, these results are preliminary and may serve as a baseline for future work in this area. The corpus of data was the entirety of the Enron corpus as well as a subsection of hand-annotated work and personal e-mails. We discovered that there is enough author signal in each class to identify an author in a sea of noise. We included suggestions for future work in the areas of expanding feature selection, increasing corpus size, and including more classification methods. Advancement in this area will contribute to increasing cyber security by identifying the senders of anonymous derogatory e-mails and reducing cyber bullying. |
author2 |
Martell, Craig |
author_facet |
Martell, Craig Levy-Minzie, Kori. |
author |
Levy-Minzie, Kori. |
spellingShingle |
Levy-Minzie, Kori. Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification |
author_sort |
Levy-Minzie, Kori. |
title |
Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification |
title_short |
Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification |
title_full |
Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification |
title_fullStr |
Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification |
title_full_unstemmed |
Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification |
title_sort |
authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification |
publisher |
Monterey, California. Naval Postgraduate School |
publishDate |
2012 |
url |
http://hdl.handle.net/10945/5780 |
work_keys_str_mv |
AT levyminziekori authorshipattributionintheemaildomainastudyoftheeffectofsizeofauthorcorpusandtopiconaccuracyofidentification |
_version_ |
1716816465328340992 |