High compression rate text summarization

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. === Includes bibliographical references (p. 95-97). === This thesis focuses on methods for condensing large documents into highly concise summaries, achieving compression rates on par wi...

Full description

Bibliographic Details
Main Author:	Branavan, Satchuthananthavale Rasiah Kuhan
Other Authors:	Regina Barzilay.
Format:	Others
Language:	English
Published:	Massachusetts Institute of Technology 2009
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/44368

id	ndltd-MIT-oai-dspace.mit.edu-1721.1-44368
record_format	oai_dc
spelling	ndltd-MIT-oai-dspace.mit.edu-1721.1-443682019-05-02T16:20:14Z High compression rate text summarization Branavan, Satchuthananthavale Rasiah Kuhan Regina Barzilay. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. Includes bibliographical references (p. 95-97). This thesis focuses on methods for condensing large documents into highly concise summaries, achieving compression rates on par with human writers. While the need for such summaries in the current age of information overload is increasing, the desired compression rate has thus far been beyond the reach of automatic summarization systems. The potency of our summarization methods is due to their in-depth modelling of document content in a probabilistic framework. We explore two types of document representation that capture orthogonal aspects of text content. The first represents the semantic properties mentioned in a document in a hierarchical Bayesian model. This method is used to summarize thousands of consumer reviews by identifying the product properties mentioned by multiple reviewers. The second representation captures discourse properties, modelling the connections between different segments of a document. This discriminatively trained model is employed to generate tables of contents for books and lecture transcripts. The summarization methods presented here have been incorporated into large-scale practical systems that help users effectively access information online. by Satchuthananthavale Rasiah Kuhan Branavan. S.M. 2009-01-30T16:37:56Z 2009-01-30T16:37:56Z 2008 2008 Thesis http://hdl.handle.net/1721.1/44368 276937779 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 97 p. application/pdf Massachusetts Institute of Technology
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Electrical Engineering and Computer Science.
spellingShingle	Electrical Engineering and Computer Science. Branavan, Satchuthananthavale Rasiah Kuhan High compression rate text summarization
description	Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. === Includes bibliographical references (p. 95-97). === This thesis focuses on methods for condensing large documents into highly concise summaries, achieving compression rates on par with human writers. While the need for such summaries in the current age of information overload is increasing, the desired compression rate has thus far been beyond the reach of automatic summarization systems. The potency of our summarization methods is due to their in-depth modelling of document content in a probabilistic framework. We explore two types of document representation that capture orthogonal aspects of text content. The first represents the semantic properties mentioned in a document in a hierarchical Bayesian model. This method is used to summarize thousands of consumer reviews by identifying the product properties mentioned by multiple reviewers. The second representation captures discourse properties, modelling the connections between different segments of a document. This discriminatively trained model is employed to generate tables of contents for books and lecture transcripts. The summarization methods presented here have been incorporated into large-scale practical systems that help users effectively access information online. === by Satchuthananthavale Rasiah Kuhan Branavan. === S.M.
author2	Regina Barzilay.
author_facet	Regina Barzilay. Branavan, Satchuthananthavale Rasiah Kuhan
author	Branavan, Satchuthananthavale Rasiah Kuhan
author_sort	Branavan, Satchuthananthavale Rasiah Kuhan
title	High compression rate text summarization
title_short	High compression rate text summarization
title_full	High compression rate text summarization
title_fullStr	High compression rate text summarization
title_full_unstemmed	High compression rate text summarization
title_sort	high compression rate text summarization
publisher	Massachusetts Institute of Technology
publishDate	2009
url	http://hdl.handle.net/1721.1/44368
work_keys_str_mv	AT branavansatchuthananthavalerasiahkuhan highcompressionratetextsummarization
_version_	1719038908991275008

High compression rate text summarization

Similar Items