Detecting Novel Associations in Large Data Sets
Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and fo...
Main Authors: | , , , , , , , , |
---|---|
Other Authors: | , , , |
Format: | Article |
Language: | English |
Published: |
American Association for the Advancement of Science (AAAS),
2014-02-03T13:18:52Z.
|
Subjects: | |
Online Access: | Get fulltext |
Summary: | Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R[superscript 2]) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships. National Institute of General Medical Sciences (U.S.) (Medical Scientist Training Program) |
---|