Multi-class and Multi-label classication of Darkweb Data

abstract: In this research, I try to solve multi-class multi-label classication problem, where the goal is to automatically assign one or more labels(tags) to discussion topics seen in deepweb. I observed natural hierarchy in our dataset, and I used dierent techniques to ensure hierarchical integ...

Full description

Bibliographic Details
Other Authors: Patil, Revanth (Author)
Format: Dissertation
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/2286/R.I.48469
id ndltd-asu.edu-item-48469
record_format oai_dc
spelling ndltd-asu.edu-item-484692018-06-22T03:09:11Z Multi-class and Multi-label classication of Darkweb Data abstract: In this research, I try to solve multi-class multi-label classication problem, where the goal is to automatically assign one or more labels(tags) to discussion topics seen in deepweb. I observed natural hierarchy in our dataset, and I used dierent techniques to ensure hierarchical integrity constraint on the predicted tag list. To solve `class imbalance' and `scarcity of labeled data' problems, I developed semisupervised model based on elastic search(ES) document relevance score. I evaluate our models using standard K-fold cross-validation method. Ensuring hierarchical integrity constraints improved F1 score by 11.9% over standard supervised learning, while our ES based semi-supervised learning model out-performed other models in terms of precision(78.4%) score while maintaining comparable recall(21%) score. Dissertation/Thesis Patil, Revanth (Author) Shakarian, Paulo (Advisor) Doupe, Adam (Committee member) Davulcu, Hasan (Committee member) Arizona State University (Publisher) Computer science eng 40 pages Masters Thesis Computer Science 2018 Masters Thesis http://hdl.handle.net/2286/R.I.48469 http://rightsstatements.org/vocab/InC/1.0/ All Rights Reserved 2018
collection NDLTD
language English
format Dissertation
sources NDLTD
topic Computer science
spellingShingle Computer science
Multi-class and Multi-label classication of Darkweb Data
description abstract: In this research, I try to solve multi-class multi-label classication problem, where the goal is to automatically assign one or more labels(tags) to discussion topics seen in deepweb. I observed natural hierarchy in our dataset, and I used dierent techniques to ensure hierarchical integrity constraint on the predicted tag list. To solve `class imbalance' and `scarcity of labeled data' problems, I developed semisupervised model based on elastic search(ES) document relevance score. I evaluate our models using standard K-fold cross-validation method. Ensuring hierarchical integrity constraints improved F1 score by 11.9% over standard supervised learning, while our ES based semi-supervised learning model out-performed other models in terms of precision(78.4%) score while maintaining comparable recall(21%) score. === Dissertation/Thesis === Masters Thesis Computer Science 2018
author2 Patil, Revanth (Author)
author_facet Patil, Revanth (Author)
title Multi-class and Multi-label classication of Darkweb Data
title_short Multi-class and Multi-label classication of Darkweb Data
title_full Multi-class and Multi-label classication of Darkweb Data
title_fullStr Multi-class and Multi-label classication of Darkweb Data
title_full_unstemmed Multi-class and Multi-label classication of Darkweb Data
title_sort multi-class and multi-label classication of darkweb data
publishDate 2018
url http://hdl.handle.net/2286/R.I.48469
_version_ 1718701683549143040