The Application of Closed Frequent Subtrees to Authorship Attribution

In this experimental study we compare the authorship attribution performance of two different types of distinguishing features; overlapping syntax subtrees of height one (or small trees) and closed frequent syntax subtrees. Authors and documents used in the experiments are randomly drawn from a larg...

Full description

Bibliographic Details
Main Author:	Lindh Morén, Jonas
Format:	Others
Language:	English
Published:	Umeå universitet, Institutionen för datavetenskap 2014
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-86458

Description
Summary:	In this experimental study we compare the authorship attribution performance of two different types of distinguishing features; overlapping syntax subtrees of height one (or small trees) and closed frequent syntax subtrees. Authors and documents used in the experiments are randomly drawn from a large corpus of blog posts and news articles. Results show that small trees outperform closed frequent trees on this data set, both in terms of classifier performance and computational eciency.

The Application of Closed Frequent Subtrees to Authorship Attribution

Similar Items