The Application of Closed Frequent Subtrees to Authorship Attribution

In this experimental study we compare the authorship attribution performance of two different types of distinguishing features; overlapping syntax subtrees of height one (or small trees) and closed frequent syntax subtrees. Authors and documents used in the experiments are randomly drawn from a larg...

Full description

Bibliographic Details
Main Author: Lindh Morén, Jonas
Format: Others
Language:English
Published: Umeå universitet, Institutionen för datavetenskap 2014
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-86458
id ndltd-UPSALLA1-oai-DiVA.org-umu-86458
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-umu-864582014-02-28T04:00:35ZThe Application of Closed Frequent Subtrees to Authorship AttributionengLindh Morén, JonasUmeå universitet, Institutionen för datavetenskap2014In this experimental study we compare the authorship attribution performance of two different types of distinguishing features; overlapping syntax subtrees of height one (or small trees) and closed frequent syntax subtrees. Authors and documents used in the experiments are randomly drawn from a large corpus of blog posts and news articles. Results show that small trees outperform closed frequent trees on this data set, both in terms of classifier performance and computational eciency. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-86458UMNAD ; 981application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
description In this experimental study we compare the authorship attribution performance of two different types of distinguishing features; overlapping syntax subtrees of height one (or small trees) and closed frequent syntax subtrees. Authors and documents used in the experiments are randomly drawn from a large corpus of blog posts and news articles. Results show that small trees outperform closed frequent trees on this data set, both in terms of classifier performance and computational eciency.
author Lindh Morén, Jonas
spellingShingle Lindh Morén, Jonas
The Application of Closed Frequent Subtrees to Authorship Attribution
author_facet Lindh Morén, Jonas
author_sort Lindh Morén, Jonas
title The Application of Closed Frequent Subtrees to Authorship Attribution
title_short The Application of Closed Frequent Subtrees to Authorship Attribution
title_full The Application of Closed Frequent Subtrees to Authorship Attribution
title_fullStr The Application of Closed Frequent Subtrees to Authorship Attribution
title_full_unstemmed The Application of Closed Frequent Subtrees to Authorship Attribution
title_sort application of closed frequent subtrees to authorship attribution
publisher Umeå universitet, Institutionen för datavetenskap
publishDate 2014
url http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-86458
work_keys_str_mv AT lindhmorenjonas theapplicationofclosedfrequentsubtreestoauthorshipattribution
AT lindhmorenjonas applicationofclosedfrequentsubtreestoauthorshipattribution
_version_ 1716649184395788288