The Application of Closed Frequent Subtrees to Authorship Attribution
In this experimental study we compare the authorship attribution performance of two different types of distinguishing features; overlapping syntax subtrees of height one (or small trees) and closed frequent syntax subtrees. Authors and documents used in the experiments are randomly drawn from a larg...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Umeå universitet, Institutionen för datavetenskap
2014
|
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-86458 |
Summary: | In this experimental study we compare the authorship attribution performance of two different types of distinguishing features; overlapping syntax subtrees of height one (or small trees) and closed frequent syntax subtrees. Authors and documents used in the experiments are randomly drawn from a large corpus of blog posts and news articles. Results show that small trees outperform closed frequent trees on this data set, both in terms of classifier performance and computational eciency. |
---|