Tensor-Based Semantically-Aware Topic Clustering of Biomedical Documents

Biomedicine is a pillar of the collective, scientific effort of human self-discovery, as well as a major source of humanistic data codified primarily in biomedical documents. Despite their rigid structure, maintaining and updating a considerably-sized collection of such documents is a task of overwh...

Full description

Bibliographic Details
Main Authors: Georgios Drakopoulos, Andreas Kanavos, Ioannis Karydis, Spyros Sioutas, Aristidis G. Vrahatis
Format: Article
Language:English
Published: MDPI AG 2017-07-01
Series:Computation
Subjects:
Online Access:https://www.mdpi.com/2079-3197/5/3/34
id doaj-2d1caeaada5c4b8e8ec495f1c9f03b4f
record_format Article
spelling doaj-2d1caeaada5c4b8e8ec495f1c9f03b4f2020-11-24T23:40:14ZengMDPI AGComputation2079-31972017-07-01533410.3390/computation5030034computation5030034Tensor-Based Semantically-Aware Topic Clustering of Biomedical DocumentsGeorgios Drakopoulos0Andreas Kanavos1Ioannis Karydis2Spyros Sioutas3Aristidis G. Vrahatis4Department of Informatics, Ionian University, Tsirigoti Square 7, Kerkyra 49100, GreeceComputer Engineering and Informatics Department, University of Patras, Patras 26504, GreeceDepartment of Informatics, Ionian University, Tsirigoti Square 7, Kerkyra 49100, GreeceDepartment of Informatics, Ionian University, Tsirigoti Square 7, Kerkyra 49100, GreeceComputer Engineering and Informatics Department, University of Patras, Patras 26504, GreeceBiomedicine is a pillar of the collective, scientific effort of human self-discovery, as well as a major source of humanistic data codified primarily in biomedical documents. Despite their rigid structure, maintaining and updating a considerably-sized collection of such documents is a task of overwhelming complexity mandating efficient information retrieval for the purpose of the integration of clustering schemes. The latter should work natively with inherently multidimensional data and higher order interdependencies. Additionally, past experience indicates that clustering should be semantically enhanced. Tensor algebra is the key to extending the current term-document model to more dimensions. In this article, an alternative keyword-term-document strategy, based on scientometric observations that keywords typically possess more expressive power than ordinary text terms, whose algorithmic cornerstones are third order tensors and MeSH ontological functions, is proposed. This strategy has been compared against a baseline using two different biomedical datasets, the TREC (Text REtrieval Conference) genomics benchmark and a large custom set of cognitive science articles from PubMed.https://www.mdpi.com/2079-3197/5/3/34humanistic datahigher order datamedical information retrievaltopic clusteringPubMedMeSH Ontologytensor algebratucker factorization
collection DOAJ
language English
format Article
sources DOAJ
author Georgios Drakopoulos
Andreas Kanavos
Ioannis Karydis
Spyros Sioutas
Aristidis G. Vrahatis
spellingShingle Georgios Drakopoulos
Andreas Kanavos
Ioannis Karydis
Spyros Sioutas
Aristidis G. Vrahatis
Tensor-Based Semantically-Aware Topic Clustering of Biomedical Documents
Computation
humanistic data
higher order data
medical information retrieval
topic clustering
PubMed
MeSH Ontology
tensor algebra
tucker factorization
author_facet Georgios Drakopoulos
Andreas Kanavos
Ioannis Karydis
Spyros Sioutas
Aristidis G. Vrahatis
author_sort Georgios Drakopoulos
title Tensor-Based Semantically-Aware Topic Clustering of Biomedical Documents
title_short Tensor-Based Semantically-Aware Topic Clustering of Biomedical Documents
title_full Tensor-Based Semantically-Aware Topic Clustering of Biomedical Documents
title_fullStr Tensor-Based Semantically-Aware Topic Clustering of Biomedical Documents
title_full_unstemmed Tensor-Based Semantically-Aware Topic Clustering of Biomedical Documents
title_sort tensor-based semantically-aware topic clustering of biomedical documents
publisher MDPI AG
series Computation
issn 2079-3197
publishDate 2017-07-01
description Biomedicine is a pillar of the collective, scientific effort of human self-discovery, as well as a major source of humanistic data codified primarily in biomedical documents. Despite their rigid structure, maintaining and updating a considerably-sized collection of such documents is a task of overwhelming complexity mandating efficient information retrieval for the purpose of the integration of clustering schemes. The latter should work natively with inherently multidimensional data and higher order interdependencies. Additionally, past experience indicates that clustering should be semantically enhanced. Tensor algebra is the key to extending the current term-document model to more dimensions. In this article, an alternative keyword-term-document strategy, based on scientometric observations that keywords typically possess more expressive power than ordinary text terms, whose algorithmic cornerstones are third order tensors and MeSH ontological functions, is proposed. This strategy has been compared against a baseline using two different biomedical datasets, the TREC (Text REtrieval Conference) genomics benchmark and a large custom set of cognitive science articles from PubMed.
topic humanistic data
higher order data
medical information retrieval
topic clustering
PubMed
MeSH Ontology
tensor algebra
tucker factorization
url https://www.mdpi.com/2079-3197/5/3/34
work_keys_str_mv AT georgiosdrakopoulos tensorbasedsemanticallyawaretopicclusteringofbiomedicaldocuments
AT andreaskanavos tensorbasedsemanticallyawaretopicclusteringofbiomedicaldocuments
AT ioanniskarydis tensorbasedsemanticallyawaretopicclusteringofbiomedicaldocuments
AT spyrossioutas tensorbasedsemanticallyawaretopicclusteringofbiomedicaldocuments
AT aristidisgvrahatis tensorbasedsemanticallyawaretopicclusteringofbiomedicaldocuments
_version_ 1725510503333101568