Computational Analyses of Scientific Publications Using Raw and Manually Curated Data with Applications to Text Visualization

Text visualization is a field dedicated to the visual representation of textual data by using computer technology. A large number of visualization techniques are available, and now it is becoming harder for researchers and practitioners to choose an optimal technique for a particular task among the...

Full description

Bibliographic Details
Main Author: Shokat, Imran
Format: Others
Language:English
Published: Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM) 2018
Subjects:
NLP
LDA
HDP
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-78995
id ndltd-UPSALLA1-oai-DiVA.org-lnu-78995
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-lnu-789952018-12-04T05:58:36ZComputational Analyses of Scientific Publications Using Raw and Manually Curated Data with Applications to Text VisualizationengShokat, ImranLinnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM)2018Scientific literature analysismeta-analysistrendscorrelationNLPtext miningtopic modelingLDAHDPtext visualizationSoftware EngineeringProgramvaruteknikText visualization is a field dedicated to the visual representation of textual data by using computer technology. A large number of visualization techniques are available, and now it is becoming harder for researchers and practitioners to choose an optimal technique for a particular task among the existing techniques. To overcome this problem, the ISOVIS Group developed an interactive survey browser for text visualization techniques. ISOVIS researchers gathered papers which describe text visualization techniques or tools and categorized them according to a taxonomy. Several categories were manually assigned to each visualization technique. In this thesis, we aim to analyze the dataset of this browser. We carried out several analyses to find temporal trends and correlations of the categories present in the browser dataset. In addition, a comparison of these categories with a computational approach has been made. Our results show that some categories became more popular than before whereas others have declined in popularity. The cases of positive and negative correlation between various categories have been found and analyzed. Comparison between manually labeled datasets and results of computational text analyses were presented to the experts with an opportunity to refine the dataset. Data which is analyzed in this thesis project is specific to text visualization field, however, methods that are used in the analyses can be generalized for applications to other datasets of scientific literature surveys or, more generally, other manually curated collections of textual documents. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-78995application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Scientific literature analysis
meta-analysis
trends
correlation
NLP
text mining
topic modeling
LDA
HDP
text visualization
Software Engineering
Programvaruteknik
spellingShingle Scientific literature analysis
meta-analysis
trends
correlation
NLP
text mining
topic modeling
LDA
HDP
text visualization
Software Engineering
Programvaruteknik
Shokat, Imran
Computational Analyses of Scientific Publications Using Raw and Manually Curated Data with Applications to Text Visualization
description Text visualization is a field dedicated to the visual representation of textual data by using computer technology. A large number of visualization techniques are available, and now it is becoming harder for researchers and practitioners to choose an optimal technique for a particular task among the existing techniques. To overcome this problem, the ISOVIS Group developed an interactive survey browser for text visualization techniques. ISOVIS researchers gathered papers which describe text visualization techniques or tools and categorized them according to a taxonomy. Several categories were manually assigned to each visualization technique. In this thesis, we aim to analyze the dataset of this browser. We carried out several analyses to find temporal trends and correlations of the categories present in the browser dataset. In addition, a comparison of these categories with a computational approach has been made. Our results show that some categories became more popular than before whereas others have declined in popularity. The cases of positive and negative correlation between various categories have been found and analyzed. Comparison between manually labeled datasets and results of computational text analyses were presented to the experts with an opportunity to refine the dataset. Data which is analyzed in this thesis project is specific to text visualization field, however, methods that are used in the analyses can be generalized for applications to other datasets of scientific literature surveys or, more generally, other manually curated collections of textual documents.
author Shokat, Imran
author_facet Shokat, Imran
author_sort Shokat, Imran
title Computational Analyses of Scientific Publications Using Raw and Manually Curated Data with Applications to Text Visualization
title_short Computational Analyses of Scientific Publications Using Raw and Manually Curated Data with Applications to Text Visualization
title_full Computational Analyses of Scientific Publications Using Raw and Manually Curated Data with Applications to Text Visualization
title_fullStr Computational Analyses of Scientific Publications Using Raw and Manually Curated Data with Applications to Text Visualization
title_full_unstemmed Computational Analyses of Scientific Publications Using Raw and Manually Curated Data with Applications to Text Visualization
title_sort computational analyses of scientific publications using raw and manually curated data with applications to text visualization
publisher Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM)
publishDate 2018
url http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-78995
work_keys_str_mv AT shokatimran computationalanalysesofscientificpublicationsusingrawandmanuallycurateddatawithapplicationstotextvisualization
_version_ 1718799480835276800