Clustering User Behavior in Scientific Collections
This master thesis looks at how clustering techniques can be appliedto a collection of scientific documents. Approximately one year of serverlogs from the CERN Document Server (CDS) are analyzed and preprocessed.Based on the findings of this analysis, and a review of thecurrent state of the art, thr...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap
2014
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-27340 |
id |
ndltd-UPSALLA1-oai-DiVA.org-ntnu-27340 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-ntnu-273402014-12-08T04:53:46ZClustering User Behavior in Scientific CollectionsengBlixhavn, Øystein HoelNorges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskapInstitutt for datateknikk og informasjonsvitenskap2014ntnudaim:12121MTDT DatateknologiData- og informasjonsforvaltningThis master thesis looks at how clustering techniques can be appliedto a collection of scientific documents. Approximately one year of serverlogs from the CERN Document Server (CDS) are analyzed and preprocessed.Based on the findings of this analysis, and a review of thecurrent state of the art, three different clustering methods are selectedfor further work: Simple k-Means, Hierarchical Agglomerative Clustering(HAC) and Graph Partitioning. In addition, a custom, agglomerativeclustering algorithm is made in an attempt to tackle some of the problemsencountered during the experiments with k-Means and HAC. The resultsfrom k-Means and HAC are poor, but the graph partitioning methodyields some promising results.The main conclusion of this thesis is that the inherent clusters withinthe user-record relationship of a scientific collection are nebulous, butexisting. Furthermore, the most common clustering algorithms are notsuitable for this type of clustering. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-27340Local ntnudaim:12121application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
ntnudaim:12121 MTDT Datateknologi Data- og informasjonsforvaltning |
spellingShingle |
ntnudaim:12121 MTDT Datateknologi Data- og informasjonsforvaltning Blixhavn, Øystein Hoel Clustering User Behavior in Scientific Collections |
description |
This master thesis looks at how clustering techniques can be appliedto a collection of scientific documents. Approximately one year of serverlogs from the CERN Document Server (CDS) are analyzed and preprocessed.Based on the findings of this analysis, and a review of thecurrent state of the art, three different clustering methods are selectedfor further work: Simple k-Means, Hierarchical Agglomerative Clustering(HAC) and Graph Partitioning. In addition, a custom, agglomerativeclustering algorithm is made in an attempt to tackle some of the problemsencountered during the experiments with k-Means and HAC. The resultsfrom k-Means and HAC are poor, but the graph partitioning methodyields some promising results.The main conclusion of this thesis is that the inherent clusters withinthe user-record relationship of a scientific collection are nebulous, butexisting. Furthermore, the most common clustering algorithms are notsuitable for this type of clustering. |
author |
Blixhavn, Øystein Hoel |
author_facet |
Blixhavn, Øystein Hoel |
author_sort |
Blixhavn, Øystein Hoel |
title |
Clustering User Behavior in Scientific Collections |
title_short |
Clustering User Behavior in Scientific Collections |
title_full |
Clustering User Behavior in Scientific Collections |
title_fullStr |
Clustering User Behavior in Scientific Collections |
title_full_unstemmed |
Clustering User Behavior in Scientific Collections |
title_sort |
clustering user behavior in scientific collections |
publisher |
Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap |
publishDate |
2014 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-27340 |
work_keys_str_mv |
AT blixhavnøysteinhoel clusteringuserbehaviorinscientificcollections |
_version_ |
1716726489129418752 |