Machine Learning for the Automated Identification of Cyberbullying and Cyberharassment

<p> Cyberbullying and cyberharassement are a growing issue that is straining the resources of human moderation teams. This is leading to an increase in suicide among the affected teens who are unable to get away from the harassment. By utilizing n-grams and support vector machines, this resear...

Full description

Bibliographic Details
Main Author:	Ducharme, Daniel N.
Language:	EN
Published:	University of Rhode Island 2017
Subjects:	Computer science
Online Access:	http://pqdtopen.proquest.com/#viewpdf?dispub=10259474

id	ndltd-PROQUEST-oai-pqdtoai.proquest.com-10259474
record_format	oai_dc
spelling	ndltd-PROQUEST-oai-pqdtoai.proquest.com-102594742017-04-27T16:10:33Z Machine Learning for the Automated Identification of Cyberbullying and Cyberharassment Ducharme, Daniel N. Computer science <p> Cyberbullying and cyberharassement are a growing issue that is straining the resources of human moderation teams. This is leading to an increase in suicide among the affected teens who are unable to get away from the harassment. By utilizing n-grams and support vector machines, this research was able to classify YouTube comments with an overall accuracy of 81.8%. This increased to 83.9% when utilizing retraining that added the misclassified comments to the training set. To accomplish this, a 350 comment balanced training set, with 7% of the highest entropy 3 length n-grams, and a polynomial kernel with the C error factor of 1, a degree of 2, and a Coef0 of 1 were used in the LibSVM implementation of the support vector machine algorithm. The 350 comments were also trimmed with a k-nearest neighbor algorithm where k was set to 4% of the training set size. With the algorithm designed to be heavily multi-threaded and capable of being run across multiple servers, the system was able to achieve that accuracy while classifying 3 comments per second, running on consumer grade hardware over Wi-Fi.</p> University of Rhode Island 2017-04-21 00:00:00.0 thesis http://pqdtopen.proquest.com/#viewpdf?dispub=10259474 EN
collection	NDLTD
language	EN
sources	NDLTD
topic	Computer science
spellingShingle	Computer science Ducharme, Daniel N. Machine Learning for the Automated Identification of Cyberbullying and Cyberharassment
description	<p> Cyberbullying and cyberharassement are a growing issue that is straining the resources of human moderation teams. This is leading to an increase in suicide among the affected teens who are unable to get away from the harassment. By utilizing n-grams and support vector machines, this research was able to classify YouTube comments with an overall accuracy of 81.8%. This increased to 83.9% when utilizing retraining that added the misclassified comments to the training set. To accomplish this, a 350 comment balanced training set, with 7% of the highest entropy 3 length n-grams, and a polynomial kernel with the C error factor of 1, a degree of 2, and a Coef0 of 1 were used in the LibSVM implementation of the support vector machine algorithm. The 350 comments were also trimmed with a k-nearest neighbor algorithm where k was set to 4% of the training set size. With the algorithm designed to be heavily multi-threaded and capable of being run across multiple servers, the system was able to achieve that accuracy while classifying 3 comments per second, running on consumer grade hardware over Wi-Fi.</p>
author	Ducharme, Daniel N.
author_facet	Ducharme, Daniel N.
author_sort	Ducharme, Daniel N.
title	Machine Learning for the Automated Identification of Cyberbullying and Cyberharassment
title_short	Machine Learning for the Automated Identification of Cyberbullying and Cyberharassment
title_full	Machine Learning for the Automated Identification of Cyberbullying and Cyberharassment
title_fullStr	Machine Learning for the Automated Identification of Cyberbullying and Cyberharassment
title_full_unstemmed	Machine Learning for the Automated Identification of Cyberbullying and Cyberharassment
title_sort	machine learning for the automated identification of cyberbullying and cyberharassment
publisher	University of Rhode Island
publishDate	2017
url	http://pqdtopen.proquest.com/#viewpdf?dispub=10259474
work_keys_str_mv	AT ducharmedanieln machinelearningfortheautomatedidentificationofcyberbullyingandcyberharassment
_version_	1718444358319996928

Machine Learning for the Automated Identification of Cyberbullying and Cyberharassment

Similar Items