Email Mining Classifier : The empirical study on combining the topic modelling with Random Forest classification
Filtering out and replying automatically to emails are of interest to many but is hard due to the complexity of the language and to dependencies of background information that is not present in the email itself. This paper investigates whether Latent Dirichlet Allocation (LDA) combined with Random F...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Högskolan i Skövde, Institutionen för informationsteknologi
2017
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-14710 |
id |
ndltd-UPSALLA1-oai-DiVA.org-his-14710 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-his-147102018-02-17T05:18:04ZEmail Mining Classifier : The empirical study on combining the topic modelling with Random Forest classificationengHalmann, MarjuHögskolan i Skövde, Institutionen för informationsteknologi2017Email miningLatent Dirichlet AllocationRandom Forest classificationComputer SciencesDatavetenskap (datalogi)Filtering out and replying automatically to emails are of interest to many but is hard due to the complexity of the language and to dependencies of background information that is not present in the email itself. This paper investigates whether Latent Dirichlet Allocation (LDA) combined with Random Forest classifier can be used for the more general email classification task and how it compares to other existing email classifiers. The comparison is based on the literature study and on the empirical experimentation using two real-life datasets. Firstly, a literature study is performed to gain insight of the accuracy of other available email classifiers. Secondly, proposed model’s accuracy is explored with experimentation. The literature study shows that the accuracy of more general email classifiers differs greatly on different user sets. The proposed model accuracy is within the reported accuracy range, however in the lower part. It indicates that the proposed model performs poorly compared to other classifiers. On average, the classifier performance improves 15 percentage points with additional information. This indicates that Latent Dirichlet Allocation (LDA) combined with Random Forest classifier is promising, however future studies are needed to explore the model and ways to further increase the accuracy. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-14710application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Email mining Latent Dirichlet Allocation Random Forest classification Computer Sciences Datavetenskap (datalogi) |
spellingShingle |
Email mining Latent Dirichlet Allocation Random Forest classification Computer Sciences Datavetenskap (datalogi) Halmann, Marju Email Mining Classifier : The empirical study on combining the topic modelling with Random Forest classification |
description |
Filtering out and replying automatically to emails are of interest to many but is hard due to the complexity of the language and to dependencies of background information that is not present in the email itself. This paper investigates whether Latent Dirichlet Allocation (LDA) combined with Random Forest classifier can be used for the more general email classification task and how it compares to other existing email classifiers. The comparison is based on the literature study and on the empirical experimentation using two real-life datasets. Firstly, a literature study is performed to gain insight of the accuracy of other available email classifiers. Secondly, proposed model’s accuracy is explored with experimentation. The literature study shows that the accuracy of more general email classifiers differs greatly on different user sets. The proposed model accuracy is within the reported accuracy range, however in the lower part. It indicates that the proposed model performs poorly compared to other classifiers. On average, the classifier performance improves 15 percentage points with additional information. This indicates that Latent Dirichlet Allocation (LDA) combined with Random Forest classifier is promising, however future studies are needed to explore the model and ways to further increase the accuracy. |
author |
Halmann, Marju |
author_facet |
Halmann, Marju |
author_sort |
Halmann, Marju |
title |
Email Mining Classifier : The empirical study on combining the topic modelling with Random Forest classification |
title_short |
Email Mining Classifier : The empirical study on combining the topic modelling with Random Forest classification |
title_full |
Email Mining Classifier : The empirical study on combining the topic modelling with Random Forest classification |
title_fullStr |
Email Mining Classifier : The empirical study on combining the topic modelling with Random Forest classification |
title_full_unstemmed |
Email Mining Classifier : The empirical study on combining the topic modelling with Random Forest classification |
title_sort |
email mining classifier : the empirical study on combining the topic modelling with random forest classification |
publisher |
Högskolan i Skövde, Institutionen för informationsteknologi |
publishDate |
2017 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-14710 |
work_keys_str_mv |
AT halmannmarju emailminingclassifiertheempiricalstudyoncombiningthetopicmodellingwithrandomforestclassification |
_version_ |
1718614640395550720 |