Uncovering High-dimensional Structures of Projections from Dimensionality Reduction Methods

Projections are conventional methods of dimensionality reduction for information visualization used to transform high-dimensional data into low dimensional space. If the projection method restricts the output space to two dimensions, the result is a scatter plot. The goal of this scatter plot is to...

Full description

Bibliographic Details
Main Authors: Michael C. Thrun, PhD, Alfred Ultsch, Prof. Dr. habil.
Format: Article
Language:English
Published: Elsevier 2020-01-01
Series:MethodsX
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2215016120303137
Description
Summary:Projections are conventional methods of dimensionality reduction for information visualization used to transform high-dimensional data into low dimensional space. If the projection method restricts the output space to two dimensions, the result is a scatter plot. The goal of this scatter plot is to visualize the relative relationships between high-dimensional data points that build up distance and density-based structures. However, the Johnson–Lindenstrauss lemma states that the two-dimensional similarities in the scatter plot cannot coercively represent high-dimensional structures. Here, a simplified emergent self-organizing map uses the projected points of such a scatter plot in combination with the dataset in order to compute the generalized U-matrix. The generalized U-matrix defines the visualization of a topographic map depicting the misrepresentations of projected points with regards to a given dimensionality reduction method and the dataset. • The topographic map provides accurate information about the high-dimensional distance and density based structures of high-dimensional data if an appropriate dimensionality reduction method is selected. • The topographic map can uncover the absence of distance-based structures. • The topographic map reveals the number of clusters in a dataset as the number of valleys.
ISSN:2215-0161