SUMMARITIVE DIGEST FOR LARGE DOCUMENT REPOSITORIES WITH APPLICATION TO E-RULEMAKING

Bibliographic Details
Main Author:	Chen, Lijun
Language:	English
Published:	Wright State University / OhioLINK 2007
Subjects:	Computer Science clustering cluster di¿¿¿¿¿¿¿erent SCDs CDs ERFRs PagodaCD
Online Access:	http://rave.ohiolink.edu/etdc/view?acc_num=wright1189038245

id	ndltd-OhioLink-oai-etd.ohiolink.edu-wright1189038245
record_format	oai_dc
spelling	ndltd-OhioLink-oai-etd.ohiolink.edu-wright11890382452021-08-03T06:16:44Z SUMMARITIVE DIGEST FOR LARGE DOCUMENT REPOSITORIES WITH APPLICATION TO E-RULEMAKING Chen, Lijun Computer Science clustering cluster di¿¿¿¿¿¿¿erent SCDs CDs ERFRs PagodaCD Large document repositories need to be organized and summarized to make them more accessible and understandable. Such needs exist in many applications, including web search, e-rulemaking (electronic rulemaking) and document archiving. Even though much has been done in the areas of document clustering and summarization, there are still many new challenges and issues that need to be addressed as the repositories become larger, more prevalent and dynamic. In this dissertation, we investigate more informative ways to organize and summarize large document repositories, especially e-rulemaking feedback repositories (ERFRs), so that the large repositories can be managed and digested more efficiently and effectively. Specifically, we mainly consider the following four tasks: 1) identifying important aspects of ERFR, 2)constructing cluster descriptions for document clustering, 3) clustering of ERFR with simultaneous construction of succinct cluster descriptions, and 4) selecting representative arguments for ERFR clustering. <p>We propose to organize and summarize e-rulemaking feedbacks based on three different major aspects of the rulemaking process, in order to meet the different needs of the rule-writers or analysts; the three aspects are: opinions (O), issues (I) and stakeholders (S). We introduce an OIS-based approach to producing informative summaritive digest (SD) for given ERFRs. In addition, several novel concepts, approaches and algorithms are introduced, including the CDD measure, active feature selection (AFS), Pagoda search algorithms, etc.</p> <p>An SD, simply put, consists of a document clustering, along with certain succinct cluster descriptions (SCDs) and representative arguments (RAs) for each cluster in the clustering. The clustering of an SD can be constructed in either a flat or hierarchical manner. For hierarchical clustering, each level of the hierarchy can be constructed by emphasizing one of the O, I, and S aspects. Different orders of O, I and S can be used for the levels of the hierarchy. Different clusterings could be used to meet the needs of different users. Given a goodness measure, a "best" clustering can be recommended to the user. An SCD consists of a set of carefully selected terms along with some statistics, and the RAs are some typical arguments selected from each cluster. An RA should be a statement where certain major stakeholders have expressed opinions on some of the important issues. Collectively, an SD provides an informative navigation aid for the rule-writers and analysts to manage and digest large ERFRs.</p> <p>We conduct an experimental evaluation on our approaches by using some publicly available ERFRs. The results suggest that the SD not only helps user for "browsing" the feedbacks, but also gives the users some high-level sense about the feedbacks before they dig into each individual comment. The results also show that our approaches are efficient and scalable for managing large document repositories.</p> <p>Even though we devoted special attention to the application of e-rulemaking, we believe that most of the ideas are very generic and can be easily applied to other types of repositories, including digital archives.</p> 2007-09-27 English text Wright State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=wright1189038245 http://rave.ohiolink.edu/etdc/view?acc_num=wright1189038245 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection	NDLTD
language	English
sources	NDLTD
topic	Computer Science clustering cluster di¿¿¿¿¿¿¿erent SCDs CDs ERFRs PagodaCD
spellingShingle	Computer Science clustering cluster di¿¿¿¿¿¿¿erent SCDs CDs ERFRs PagodaCD Chen, Lijun SUMMARITIVE DIGEST FOR LARGE DOCUMENT REPOSITORIES WITH APPLICATION TO E-RULEMAKING
author	Chen, Lijun
author_facet	Chen, Lijun
author_sort	Chen, Lijun
title	SUMMARITIVE DIGEST FOR LARGE DOCUMENT REPOSITORIES WITH APPLICATION TO E-RULEMAKING
title_short	SUMMARITIVE DIGEST FOR LARGE DOCUMENT REPOSITORIES WITH APPLICATION TO E-RULEMAKING
title_full	SUMMARITIVE DIGEST FOR LARGE DOCUMENT REPOSITORIES WITH APPLICATION TO E-RULEMAKING
title_fullStr	SUMMARITIVE DIGEST FOR LARGE DOCUMENT REPOSITORIES WITH APPLICATION TO E-RULEMAKING
title_full_unstemmed	SUMMARITIVE DIGEST FOR LARGE DOCUMENT REPOSITORIES WITH APPLICATION TO E-RULEMAKING
title_sort	summaritive digest for large document repositories with application to e-rulemaking
publisher	Wright State University / OhioLINK
publishDate	2007
url	http://rave.ohiolink.edu/etdc/view?acc_num=wright1189038245
work_keys_str_mv	AT chenlijun summaritivedigestforlargedocumentrepositorieswithapplicationtoerulemaking
_version_	1719433926885244928

SUMMARITIVE DIGEST FOR LARGE DOCUMENT REPOSITORIES WITH APPLICATION TO E-RULEMAKING

Similar Items