Uncovering High-dimensional Structures of Projections from Dimensionality Reduction Methods

Projections are conventional methods of dimensionality reduction for information visualization used to transform high-dimensional data into low dimensional space. If the projection method restricts the output space to two dimensions, the result is a scatter plot. The goal of this scatter plot is to...

Full description

Bibliographic Details
Main Authors: Michael C. Thrun, PhD, Alfred Ultsch, Prof. Dr. habil.
Format: Article
Language:English
Published: Elsevier 2020-01-01
Series:MethodsX
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2215016120303137
id doaj-03c37bea42264da78d71c92facd28767
record_format Article
spelling doaj-03c37bea42264da78d71c92facd287672021-01-02T05:11:03ZengElsevierMethodsX2215-01612020-01-017101093Uncovering High-dimensional Structures of Projections from Dimensionality Reduction MethodsMichael C. Thrun, PhD0Alfred Ultsch, Prof. Dr. habil.1Dept. of Hematology, Oncology and Immunology, Philipps-University of Marburg, Baldingerstraße, D-35043 Marburg; Corresponding author.Databionics Research Group, Philipps-University of Marburg, Hans-Meerwein-Straße 6, Marburg D-35032, GermanyProjections are conventional methods of dimensionality reduction for information visualization used to transform high-dimensional data into low dimensional space. If the projection method restricts the output space to two dimensions, the result is a scatter plot. The goal of this scatter plot is to visualize the relative relationships between high-dimensional data points that build up distance and density-based structures. However, the Johnson–Lindenstrauss lemma states that the two-dimensional similarities in the scatter plot cannot coercively represent high-dimensional structures. Here, a simplified emergent self-organizing map uses the projected points of such a scatter plot in combination with the dataset in order to compute the generalized U-matrix. The generalized U-matrix defines the visualization of a topographic map depicting the misrepresentations of projected points with regards to a given dimensionality reduction method and the dataset. • The topographic map provides accurate information about the high-dimensional distance and density based structures of high-dimensional data if an appropriate dimensionality reduction method is selected. • The topographic map can uncover the absence of distance-based structures. • The topographic map reveals the number of clusters in a dataset as the number of valleys.http://www.sciencedirect.com/science/article/pii/S2215016120303137Dimensionality reductionProjection methodsData visualizationUnsupervised neural networksSelf-organizing maps
collection DOAJ
language English
format Article
sources DOAJ
author Michael C. Thrun, PhD
Alfred Ultsch, Prof. Dr. habil.
spellingShingle Michael C. Thrun, PhD
Alfred Ultsch, Prof. Dr. habil.
Uncovering High-dimensional Structures of Projections from Dimensionality Reduction Methods
MethodsX
Dimensionality reduction
Projection methods
Data visualization
Unsupervised neural networks
Self-organizing maps
author_facet Michael C. Thrun, PhD
Alfred Ultsch, Prof. Dr. habil.
author_sort Michael C. Thrun, PhD
title Uncovering High-dimensional Structures of Projections from Dimensionality Reduction Methods
title_short Uncovering High-dimensional Structures of Projections from Dimensionality Reduction Methods
title_full Uncovering High-dimensional Structures of Projections from Dimensionality Reduction Methods
title_fullStr Uncovering High-dimensional Structures of Projections from Dimensionality Reduction Methods
title_full_unstemmed Uncovering High-dimensional Structures of Projections from Dimensionality Reduction Methods
title_sort uncovering high-dimensional structures of projections from dimensionality reduction methods
publisher Elsevier
series MethodsX
issn 2215-0161
publishDate 2020-01-01
description Projections are conventional methods of dimensionality reduction for information visualization used to transform high-dimensional data into low dimensional space. If the projection method restricts the output space to two dimensions, the result is a scatter plot. The goal of this scatter plot is to visualize the relative relationships between high-dimensional data points that build up distance and density-based structures. However, the Johnson–Lindenstrauss lemma states that the two-dimensional similarities in the scatter plot cannot coercively represent high-dimensional structures. Here, a simplified emergent self-organizing map uses the projected points of such a scatter plot in combination with the dataset in order to compute the generalized U-matrix. The generalized U-matrix defines the visualization of a topographic map depicting the misrepresentations of projected points with regards to a given dimensionality reduction method and the dataset. • The topographic map provides accurate information about the high-dimensional distance and density based structures of high-dimensional data if an appropriate dimensionality reduction method is selected. • The topographic map can uncover the absence of distance-based structures. • The topographic map reveals the number of clusters in a dataset as the number of valleys.
topic Dimensionality reduction
Projection methods
Data visualization
Unsupervised neural networks
Self-organizing maps
url http://www.sciencedirect.com/science/article/pii/S2215016120303137
work_keys_str_mv AT michaelcthrunphd uncoveringhighdimensionalstructuresofprojectionsfromdimensionalityreductionmethods
AT alfredultschprofdrhabil uncoveringhighdimensionalstructuresofprojectionsfromdimensionalityreductionmethods
_version_ 1724359370669555712