Global models of document structure using latent permutations

We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selec...

Full description

Bibliographic Details
Main Authors: Chen, Harr (Contributor), Branavan, Satchuthanan R. (Contributor), Barzilay, Regina (Contributor), Karger, David R. (Contributor)
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor), Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format: Article
Language:English
Published: Association for Computational Linguistics, 2010-10-14T12:43:57Z.
Subjects:
Online Access:Get fulltext
LEADER 02001 am a22003373u 4500
001 59312
042 |a dc 
100 1 0 |a Chen, Harr  |e author 
100 1 0 |a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory  |e contributor 
100 1 0 |a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science  |e contributor 
100 1 0 |a Barzilay, Regina  |e contributor 
100 1 0 |a Chen, Harr  |e contributor 
100 1 0 |a Branavan, Satchuthanan R.  |e contributor 
100 1 0 |a Barzilay, Regina  |e contributor 
100 1 0 |a Karger, David R.  |e contributor 
700 1 0 |a Branavan, Satchuthanan R.  |e author 
700 1 0 |a Barzilay, Regina  |e author 
700 1 0 |a Karger, David R.  |e author 
245 0 0 |a Global models of document structure using latent permutations 
260 |b Association for Computational Linguistics,   |c 2010-10-14T12:43:57Z. 
856 |z Get fulltext  |u http://hdl.handle.net/1721.1/59312 
520 |a We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be elegantly represented using a distribution over permutations called the generalized Mallows model. Our structure-aware approach substantially outperforms alternative approaches for cross-document comparison and single-document segmentation. 
546 |a en_US 
690 |a algorithms 
690 |a design 
690 |a experimentation 
690 |a languages 
690 |a measurement 
690 |a performance 
655 7 |a Article 
773 |t Proceedings of Human Language Technologies: the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics