Clustering User Behavior in Scientific Collections

This master thesis looks at how clustering techniques can be appliedto a collection of scientific documents. Approximately one year of serverlogs from the CERN Document Server (CDS) are analyzed and preprocessed.Based on the findings of this analysis, and a review of thecurrent state of the art, thr...

Full description

Bibliographic Details
Main Author: Blixhavn, Øystein Hoel
Format: Others
Language:English
Published: Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap 2014
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-27340
id ndltd-UPSALLA1-oai-DiVA.org-ntnu-27340
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-ntnu-273402014-12-08T04:53:46ZClustering User Behavior in Scientific CollectionsengBlixhavn, Øystein HoelNorges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskapInstitutt for datateknikk og informasjonsvitenskap2014ntnudaim:12121MTDT DatateknologiData- og informasjonsforvaltningThis master thesis looks at how clustering techniques can be appliedto a collection of scientific documents. Approximately one year of serverlogs from the CERN Document Server (CDS) are analyzed and preprocessed.Based on the findings of this analysis, and a review of thecurrent state of the art, three different clustering methods are selectedfor further work: Simple k-Means, Hierarchical Agglomerative Clustering(HAC) and Graph Partitioning. In addition, a custom, agglomerativeclustering algorithm is made in an attempt to tackle some of the problemsencountered during the experiments with k-Means and HAC. The resultsfrom k-Means and HAC are poor, but the graph partitioning methodyields some promising results.The main conclusion of this thesis is that the inherent clusters withinthe user-record relationship of a scientific collection are nebulous, butexisting. Furthermore, the most common clustering algorithms are notsuitable for this type of clustering. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-27340Local ntnudaim:12121application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic ntnudaim:12121
MTDT Datateknologi
Data- og informasjonsforvaltning
spellingShingle ntnudaim:12121
MTDT Datateknologi
Data- og informasjonsforvaltning
Blixhavn, Øystein Hoel
Clustering User Behavior in Scientific Collections
description This master thesis looks at how clustering techniques can be appliedto a collection of scientific documents. Approximately one year of serverlogs from the CERN Document Server (CDS) are analyzed and preprocessed.Based on the findings of this analysis, and a review of thecurrent state of the art, three different clustering methods are selectedfor further work: Simple k-Means, Hierarchical Agglomerative Clustering(HAC) and Graph Partitioning. In addition, a custom, agglomerativeclustering algorithm is made in an attempt to tackle some of the problemsencountered during the experiments with k-Means and HAC. The resultsfrom k-Means and HAC are poor, but the graph partitioning methodyields some promising results.The main conclusion of this thesis is that the inherent clusters withinthe user-record relationship of a scientific collection are nebulous, butexisting. Furthermore, the most common clustering algorithms are notsuitable for this type of clustering.
author Blixhavn, Øystein Hoel
author_facet Blixhavn, Øystein Hoel
author_sort Blixhavn, Øystein Hoel
title Clustering User Behavior in Scientific Collections
title_short Clustering User Behavior in Scientific Collections
title_full Clustering User Behavior in Scientific Collections
title_fullStr Clustering User Behavior in Scientific Collections
title_full_unstemmed Clustering User Behavior in Scientific Collections
title_sort clustering user behavior in scientific collections
publisher Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap
publishDate 2014
url http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-27340
work_keys_str_mv AT blixhavnøysteinhoel clusteringuserbehaviorinscientificcollections
_version_ 1716726489129418752