Building CTRnet Digital Library Services using Archive-It and LucidWorks Big Data Software

When a crisis occurs, information flows rapidly in the Web through social media, blogs, and news articles. The shared information captures the reactions, impacts, and responses from the government as well as the public. Later, researchers, scholars, students, and others seek information about earlie...

Full description

Bibliographic Details
Main Author:	Chitturi, Kiran
Other Authors:	Computer Science
Format:	Others
Published:	Virginia Tech 2014
Subjects:	Digital Library Services CTRnet Internet Archive LucidWorks Big Data Crises Archive-It
Online Access:	http://hdl.handle.net/10919/46865

id	ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-46865
record_format	oai_dc
spelling	ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-468652021-02-25T05:39:58Z Building CTRnet Digital Library Services using Archive-It and LucidWorks Big Data Software Chitturi, Kiran Computer Science Fox, Edward A. Sheetz, Steven D. Yao, Danfeng (Daphne) Digital Library Services CTRnet Internet Archive LucidWorks Big Data Crises Archive-It When a crisis occurs, information flows rapidly in the Web through social media, blogs, and news articles. The shared information captures the reactions, impacts, and responses from the government as well as the public. Later, researchers, scholars, students, and others seek information about earlier events, sometimes for cross-event analysis or comparison. There are very few integrated systems which try to collect and permanently archive the information about an event and provide access to the crisis information at the same time. In this thesis, we describe the CTRnet Digital Library and Archive which aims to permanently archive crisis event information by using Archive-It services and then provide access to the archived information by using LucidWorks Big Data software. Through the Big Data (LWBD) software, we take advantage of text extraction, clustering, similarity, annotation, and indexing services and build digital libraries with the generated metadata that will be helpful for the system stakeholders to locate information about an event. Through this study, we collected data for 46 crises events using Archive-It. We built a CTRnet DL prototype and its services for the ``Boston Marathon Bombing" collection by using the components of LucidWorks Big Data. Running LucidWorks Big Data on a 30 node Hadoop cluster accelerates the sub-workflows processing and also provides fault tolerant execution. LWBD sub-workflows, ``ingest" and ``extract", processed the textual data present in the WARC files. Other sub-workflows ``kmeans", ``simdoc", and ``annotate" helped in grouping the search-results, deleting the duplicates and providing metadata for additional facets in the CTRnet DL prototype, respectively. Master of Science 2014-03-28T08:00:17Z 2014-03-28T08:00:17Z 2014-03-27 Thesis vt_gsexam:1340 http://hdl.handle.net/10919/46865 In Copyright http://rightsstatements.org/vocab/InC/1.0/ ETD application/pdf application/pdf Virginia Tech
collection	NDLTD
format	Others
sources	NDLTD
topic	Digital Library Services CTRnet Internet Archive LucidWorks Big Data Crises Archive-It
spellingShingle	Digital Library Services CTRnet Internet Archive LucidWorks Big Data Crises Archive-It Chitturi, Kiran Building CTRnet Digital Library Services using Archive-It and LucidWorks Big Data Software
description	When a crisis occurs, information flows rapidly in the Web through social media, blogs, and news articles. The shared information captures the reactions, impacts, and responses from the government as well as the public. Later, researchers, scholars, students, and others seek information about earlier events, sometimes for cross-event analysis or comparison. There are very few integrated systems which try to collect and permanently archive the information about an event and provide access to the crisis information at the same time. In this thesis, we describe the CTRnet Digital Library and Archive which aims to permanently archive crisis event information by using Archive-It services and then provide access to the archived information by using LucidWorks Big Data software. Through the Big Data (LWBD) software, we take advantage of text extraction, clustering, similarity, annotation, and indexing services and build digital libraries with the generated metadata that will be helpful for the system stakeholders to locate information about an event. Through this study, we collected data for 46 crises events using Archive-It. We built a CTRnet DL prototype and its services for the ``Boston Marathon Bombing" collection by using the components of LucidWorks Big Data. Running LucidWorks Big Data on a 30 node Hadoop cluster accelerates the sub-workflows processing and also provides fault tolerant execution. LWBD sub-workflows, ``ingest" and ``extract", processed the textual data present in the WARC files. Other sub-workflows ``kmeans", ``simdoc", and ``annotate" helped in grouping the search-results, deleting the duplicates and providing metadata for additional facets in the CTRnet DL prototype, respectively. === Master of Science
author2	Computer Science
author_facet	Computer Science Chitturi, Kiran
author	Chitturi, Kiran
author_sort	Chitturi, Kiran
title	Building CTRnet Digital Library Services using Archive-It and LucidWorks Big Data Software
title_short	Building CTRnet Digital Library Services using Archive-It and LucidWorks Big Data Software
title_full	Building CTRnet Digital Library Services using Archive-It and LucidWorks Big Data Software
title_fullStr	Building CTRnet Digital Library Services using Archive-It and LucidWorks Big Data Software
title_full_unstemmed	Building CTRnet Digital Library Services using Archive-It and LucidWorks Big Data Software
title_sort	building ctrnet digital library services using archive-it and lucidworks big data software
publisher	Virginia Tech
publishDate	2014
url	http://hdl.handle.net/10919/46865
work_keys_str_mv	AT chitturikiran buildingctrnetdigitallibraryservicesusingarchiveitandlucidworksbigdatasoftware
_version_	1719378761561931776

Building CTRnet Digital Library Services using Archive-It and LucidWorks Big Data Software

Similar Items