ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data

Background Clustering is one of the most common techniques in data analysis and seeks to group together data points that are similar in some measure. Although there are many computer programs available for performing clustering, a single web resource that provides several state-of-the-art clustering...

Full description

Bibliographic Details
Main Authors: Mohith Manjunath, Yi Zhang, Yeonsung Kim, Steve H. Yeo, Omar Sobh, Nathan Russell, Christian Followell, Colleen Bushell, Umberto Ravaioli, Jun S. Song
Format: Article
Language:English
Published: PeerJ Inc. 2018-05-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-155.pdf
id doaj-9d98b479de394fcea7bb04c4b25823e1
record_format Article
spelling doaj-9d98b479de394fcea7bb04c4b25823e12020-11-24T23:52:40ZengPeerJ Inc.PeerJ Computer Science2376-59922018-05-014e15510.7717/peerj-cs.155ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional dataMohith Manjunath0Yi Zhang1Yeonsung Kim2Steve H. Yeo3Omar Sobh4Nathan Russell5Christian Followell6Colleen Bushell7Umberto Ravaioli8Jun S. Song9Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, United States of AmericaCarl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, United States of AmericaCarl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, United States of AmericaCarl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, United States of AmericaCarl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, United States of AmericaIllinois Applied Research Institute, University of Illinois at Urbana-Champaign, Champaign, IL, United States of AmericaIllinois Applied Research Institute, University of Illinois at Urbana-Champaign, Champaign, IL, United States of AmericaIllinois Applied Research Institute, University of Illinois at Urbana-Champaign, Champaign, IL, United States of AmericaDepartment of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Champaign, IL, United States of AmericaCarl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, United States of AmericaBackground Clustering is one of the most common techniques in data analysis and seeks to group together data points that are similar in some measure. Although there are many computer programs available for performing clustering, a single web resource that provides several state-of-the-art clustering methods, interactive visualizations and evaluation of clustering results is lacking. Methods ClusterEnG (acronym for Clustering Engine for Genomics) provides a web interface for clustering data and interactive visualizations including 3D views, data selection and zoom features. Eighteen clustering validation measures are also presented to aid the user in selecting a suitable algorithm for their dataset. ClusterEnG also aims at educating the user about the similarities and differences between various clustering algorithms and provides tutorials that demonstrate potential pitfalls of each algorithm. Conclusions The web resource will be particularly useful to scientists who are not conversant with computing but want to understand the structure of their data in an intuitive manner. The validation measures facilitate the process of choosing a suitable clustering algorithm among the available options. ClusterEnG is part of a bigger project called KnowEnG (Knowledge Engine for Genomics) and is available at http://education.knoweng.org/clustereng.https://peerj.com/articles/cs-155.pdfValidation measuresGenomicsWeb interfaceEducationClustering
collection DOAJ
language English
format Article
sources DOAJ
author Mohith Manjunath
Yi Zhang
Yeonsung Kim
Steve H. Yeo
Omar Sobh
Nathan Russell
Christian Followell
Colleen Bushell
Umberto Ravaioli
Jun S. Song
spellingShingle Mohith Manjunath
Yi Zhang
Yeonsung Kim
Steve H. Yeo
Omar Sobh
Nathan Russell
Christian Followell
Colleen Bushell
Umberto Ravaioli
Jun S. Song
ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data
PeerJ Computer Science
Validation measures
Genomics
Web interface
Education
Clustering
author_facet Mohith Manjunath
Yi Zhang
Yeonsung Kim
Steve H. Yeo
Omar Sobh
Nathan Russell
Christian Followell
Colleen Bushell
Umberto Ravaioli
Jun S. Song
author_sort Mohith Manjunath
title ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data
title_short ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data
title_full ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data
title_fullStr ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data
title_full_unstemmed ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data
title_sort clustereng: an interactive educational web resource for clustering and visualizing high-dimensional data
publisher PeerJ Inc.
series PeerJ Computer Science
issn 2376-5992
publishDate 2018-05-01
description Background Clustering is one of the most common techniques in data analysis and seeks to group together data points that are similar in some measure. Although there are many computer programs available for performing clustering, a single web resource that provides several state-of-the-art clustering methods, interactive visualizations and evaluation of clustering results is lacking. Methods ClusterEnG (acronym for Clustering Engine for Genomics) provides a web interface for clustering data and interactive visualizations including 3D views, data selection and zoom features. Eighteen clustering validation measures are also presented to aid the user in selecting a suitable algorithm for their dataset. ClusterEnG also aims at educating the user about the similarities and differences between various clustering algorithms and provides tutorials that demonstrate potential pitfalls of each algorithm. Conclusions The web resource will be particularly useful to scientists who are not conversant with computing but want to understand the structure of their data in an intuitive manner. The validation measures facilitate the process of choosing a suitable clustering algorithm among the available options. ClusterEnG is part of a bigger project called KnowEnG (Knowledge Engine for Genomics) and is available at http://education.knoweng.org/clustereng.
topic Validation measures
Genomics
Web interface
Education
Clustering
url https://peerj.com/articles/cs-155.pdf
work_keys_str_mv AT mohithmanjunath clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT yizhang clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT yeonsungkim clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT stevehyeo clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT omarsobh clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT nathanrussell clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT christianfollowell clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT colleenbushell clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT umbertoravaioli clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT junssong clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
_version_ 1725472654307098624