On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests

Nonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having bein...

Full description

Bibliographic Details
Main Authors: Aaditya Ramdas, Nicolás García Trillos, Marco Cuturi
Format: Article
Language:English
Published: MDPI AG 2017-01-01
Series:Entropy
Subjects:
Online Access:http://www.mdpi.com/1099-4300/19/2/47
id doaj-089cd3d1923c432d83dc31b67447fe11
record_format Article
spelling doaj-089cd3d1923c432d83dc31b67447fe112020-11-24T23:49:53ZengMDPI AGEntropy1099-43002017-01-011924710.3390/e19020047e19020047On Wasserstein Two-Sample Testing and Related Families of Nonparametric TestsAaditya Ramdas0Nicolás García Trillos1Marco Cuturi2Departments of Statistics and Computer Science, University of California, Berkeley, CA 94703, USADepartment of Mathematics, Brown University, Providence, RI 02912, USALaboratory of Statistics, CREST, ENSAE, Université Paris-Saclay, Malakoff 92240, FranceNonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being designed and analyzed, both for the unidimensional and the multivariate setting. Inthisshortsurvey,wefocusonteststatisticsthatinvolvetheWassersteindistance. Usingan entropic smoothing of the Wasserstein distance, we connect these to very different tests including multivariate methods involving energy statistics and kernel based maximum mean discrepancy and univariate methods like the Kolmogorov–Smirnov test, probability or quantile (PP/QQ) plots and receiver operating characteristic or ordinal dominance (ROC/ODC) curves. Some observations are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two-sample testing’s classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.http://www.mdpi.com/1099-4300/19/2/47two-sample testingwasserstein distanceentropic smoothingenergy distancemaximum mean discrepancyQQ and PP plotsROC and ODC curves
collection DOAJ
language English
format Article
sources DOAJ
author Aaditya Ramdas
Nicolás García Trillos
Marco Cuturi
spellingShingle Aaditya Ramdas
Nicolás García Trillos
Marco Cuturi
On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests
Entropy
two-sample testing
wasserstein distance
entropic smoothing
energy distance
maximum mean discrepancy
QQ and PP plots
ROC and ODC curves
author_facet Aaditya Ramdas
Nicolás García Trillos
Marco Cuturi
author_sort Aaditya Ramdas
title On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests
title_short On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests
title_full On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests
title_fullStr On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests
title_full_unstemmed On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests
title_sort on wasserstein two-sample testing and related families of nonparametric tests
publisher MDPI AG
series Entropy
issn 1099-4300
publishDate 2017-01-01
description Nonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being designed and analyzed, both for the unidimensional and the multivariate setting. Inthisshortsurvey,wefocusonteststatisticsthatinvolvetheWassersteindistance. Usingan entropic smoothing of the Wasserstein distance, we connect these to very different tests including multivariate methods involving energy statistics and kernel based maximum mean discrepancy and univariate methods like the Kolmogorov–Smirnov test, probability or quantile (PP/QQ) plots and receiver operating characteristic or ordinal dominance (ROC/ODC) curves. Some observations are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two-sample testing’s classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.
topic two-sample testing
wasserstein distance
entropic smoothing
energy distance
maximum mean discrepancy
QQ and PP plots
ROC and ODC curves
url http://www.mdpi.com/1099-4300/19/2/47
work_keys_str_mv AT aadityaramdas onwassersteintwosampletestingandrelatedfamiliesofnonparametrictests
AT nicolasgarciatrillos onwassersteintwosampletestingandrelatedfamiliesofnonparametrictests
AT marcocuturi onwassersteintwosampletestingandrelatedfamiliesofnonparametrictests
_version_ 1725481050087358464