The Application of Closed Frequent Subtrees to Authorship Attribution

In this experimental study we compare the authorship attribution performance of two different types of distinguishing features; overlapping syntax subtrees of height one (or small trees) and closed frequent syntax subtrees. Authors and documents used in the experiments are randomly drawn from a larg...

Full description

Bibliographic Details
Main Author: Lindh Morén, Jonas
Format: Others
Language:English
Published: Umeå universitet, Institutionen för datavetenskap 2014
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-86458
Description
Summary:In this experimental study we compare the authorship attribution performance of two different types of distinguishing features; overlapping syntax subtrees of height one (or small trees) and closed frequent syntax subtrees. Authors and documents used in the experiments are randomly drawn from a large corpus of blog posts and news articles. Results show that small trees outperform closed frequent trees on this data set, both in terms of classifier performance and computational eciency.