Utvärdering av nyckelordsbaserad textkategoriseringsalgoritmer

Supervised learning algorithms have been used for automatic text categoriza- tion with very good results. But supervised learning requires a large amount of manually labeled training data and this is a serious limitation for many practical applications. Keyword-based text categorization does not req...

Full description

Bibliographic Details
Main Author: Karlsson, Vide
Format: Others
Language:Swedish
Published: KTH, Programvaruteknik och datorsystem, SCS 2016
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-222164
id ndltd-UPSALLA1-oai-DiVA.org-kth-222164
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-kth-2221642018-02-07T05:09:52ZUtvärdering av nyckelordsbaserad textkategoriseringsalgoritmersweKarlsson, VideKTH, Programvaruteknik och datorsystem, SCS2016Computer SystemsDatorsystemSupervised learning algorithms have been used for automatic text categoriza- tion with very good results. But supervised learning requires a large amount of manually labeled training data and this is a serious limitation for many practical applications. Keyword-based text categorization does not require manually la- beled training data and has therefore been presented as an attractive alternative to supervised learning. The aim of this study is to explore if there are other li- mitations for using keyword-based text categorization in industrial applications. This study also tests if a new lexical resource, based on the paradigmatic rela- tions between words, could be used to improve existing keyword-based text ca- tegorization algorithms. An industry motivated use case was created to measure practical applicability. The results showed that none of five examined algorithms was able to meet the requirements in the industrial motivated use case. But it was possible to modify one algorithm proposed by Liebeskind et.al. (2015) to meet the requirements. The new lexical resource produced relevant keywords for text categorization but there was still a large variance in the algorithm’s capaci- ty to correctly categorize different text categories. The categorization capacity was also generally too low to meet the requirements in many practical applica- tions. Further studies are needed to explore how the algorithm’s categorization capacity could be improved.  Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-222164application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language Swedish
format Others
sources NDLTD
topic Computer Systems
Datorsystem
spellingShingle Computer Systems
Datorsystem
Karlsson, Vide
Utvärdering av nyckelordsbaserad textkategoriseringsalgoritmer
description Supervised learning algorithms have been used for automatic text categoriza- tion with very good results. But supervised learning requires a large amount of manually labeled training data and this is a serious limitation for many practical applications. Keyword-based text categorization does not require manually la- beled training data and has therefore been presented as an attractive alternative to supervised learning. The aim of this study is to explore if there are other li- mitations for using keyword-based text categorization in industrial applications. This study also tests if a new lexical resource, based on the paradigmatic rela- tions between words, could be used to improve existing keyword-based text ca- tegorization algorithms. An industry motivated use case was created to measure practical applicability. The results showed that none of five examined algorithms was able to meet the requirements in the industrial motivated use case. But it was possible to modify one algorithm proposed by Liebeskind et.al. (2015) to meet the requirements. The new lexical resource produced relevant keywords for text categorization but there was still a large variance in the algorithm’s capaci- ty to correctly categorize different text categories. The categorization capacity was also generally too low to meet the requirements in many practical applica- tions. Further studies are needed to explore how the algorithm’s categorization capacity could be improved. 
author Karlsson, Vide
author_facet Karlsson, Vide
author_sort Karlsson, Vide
title Utvärdering av nyckelordsbaserad textkategoriseringsalgoritmer
title_short Utvärdering av nyckelordsbaserad textkategoriseringsalgoritmer
title_full Utvärdering av nyckelordsbaserad textkategoriseringsalgoritmer
title_fullStr Utvärdering av nyckelordsbaserad textkategoriseringsalgoritmer
title_full_unstemmed Utvärdering av nyckelordsbaserad textkategoriseringsalgoritmer
title_sort utvärdering av nyckelordsbaserad textkategoriseringsalgoritmer
publisher KTH, Programvaruteknik och datorsystem, SCS
publishDate 2016
url http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-222164
work_keys_str_mv AT karlssonvide utvarderingavnyckelordsbaseradtextkategoriseringsalgoritmer
_version_ 1718613888853868544