On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests
Nonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having bein...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2017-01-01
|
Series: | Entropy |
Subjects: | |
Online Access: | http://www.mdpi.com/1099-4300/19/2/47 |
id |
doaj-089cd3d1923c432d83dc31b67447fe11 |
---|---|
record_format |
Article |
spelling |
doaj-089cd3d1923c432d83dc31b67447fe112020-11-24T23:49:53ZengMDPI AGEntropy1099-43002017-01-011924710.3390/e19020047e19020047On Wasserstein Two-Sample Testing and Related Families of Nonparametric TestsAaditya Ramdas0Nicolás García Trillos1Marco Cuturi2Departments of Statistics and Computer Science, University of California, Berkeley, CA 94703, USADepartment of Mathematics, Brown University, Providence, RI 02912, USALaboratory of Statistics, CREST, ENSAE, Université Paris-Saclay, Malakoff 92240, FranceNonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being designed and analyzed, both for the unidimensional and the multivariate setting. Inthisshortsurvey,wefocusonteststatisticsthatinvolvetheWassersteindistance. Usingan entropic smoothing of the Wasserstein distance, we connect these to very different tests including multivariate methods involving energy statistics and kernel based maximum mean discrepancy and univariate methods like the Kolmogorov–Smirnov test, probability or quantile (PP/QQ) plots and receiver operating characteristic or ordinal dominance (ROC/ODC) curves. Some observations are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two-sample testing’s classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.http://www.mdpi.com/1099-4300/19/2/47two-sample testingwasserstein distanceentropic smoothingenergy distancemaximum mean discrepancyQQ and PP plotsROC and ODC curves |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Aaditya Ramdas Nicolás García Trillos Marco Cuturi |
spellingShingle |
Aaditya Ramdas Nicolás García Trillos Marco Cuturi On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests Entropy two-sample testing wasserstein distance entropic smoothing energy distance maximum mean discrepancy QQ and PP plots ROC and ODC curves |
author_facet |
Aaditya Ramdas Nicolás García Trillos Marco Cuturi |
author_sort |
Aaditya Ramdas |
title |
On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests |
title_short |
On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests |
title_full |
On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests |
title_fullStr |
On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests |
title_full_unstemmed |
On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests |
title_sort |
on wasserstein two-sample testing and related families of nonparametric tests |
publisher |
MDPI AG |
series |
Entropy |
issn |
1099-4300 |
publishDate |
2017-01-01 |
description |
Nonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being designed and analyzed, both for the unidimensional and the multivariate setting. Inthisshortsurvey,wefocusonteststatisticsthatinvolvetheWassersteindistance. Usingan entropic smoothing of the Wasserstein distance, we connect these to very different tests including multivariate methods involving energy statistics and kernel based maximum mean discrepancy and univariate methods like the Kolmogorov–Smirnov test, probability or quantile (PP/QQ) plots and receiver operating characteristic or ordinal dominance (ROC/ODC) curves. Some observations are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two-sample testing’s classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others. |
topic |
two-sample testing wasserstein distance entropic smoothing energy distance maximum mean discrepancy QQ and PP plots ROC and ODC curves |
url |
http://www.mdpi.com/1099-4300/19/2/47 |
work_keys_str_mv |
AT aadityaramdas onwassersteintwosampletestingandrelatedfamiliesofnonparametrictests AT nicolasgarciatrillos onwassersteintwosampletestingandrelatedfamiliesofnonparametrictests AT marcocuturi onwassersteintwosampletestingandrelatedfamiliesofnonparametrictests |
_version_ |
1725481050087358464 |