Partition-based Model Representation Learning

Modern machine learning consists of both task forces from classical statistics and modern computation. On the one hand, this field becomes rich and quick-growing; on the other hand, different convention from different schools becomes harder and harder to communicate over time. A lot of the times, th...

Full description

Bibliographic Details
Main Author: Hsu, Yayun
Language:English
Published: 2020
Subjects:
Online Access:https://doi.org/10.7916/d8-2baa-6k42
id ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-d8-2baa-6k42
record_format oai_dc
spelling ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-d8-2baa-6k422020-09-23T05:03:17ZPartition-based Model Representation LearningHsu, Yayun2020ThesesStatisticsInference--Data processingStatistics--MethodologyStatistics--ModelsMachine learning--Statistical methodsModern machine learning consists of both task forces from classical statistics and modern computation. On the one hand, this field becomes rich and quick-growing; on the other hand, different convention from different schools becomes harder and harder to communicate over time. A lot of the times, the problem is not about who is absolutely right or wrong, but about from which angle that one should approach the problem. This is the moment when we feel there should be a unifying machine learning framework that can withhold different schools under the same umbrella. So we propose one of such a framework and call it ``representation learning''. Representations are for the data, which is almost identical to a statistical model. However, philosophically, we would like to distinguish from classical statistical modeling such that (1) representations are interpretable to the scientist, (2) representations convey the pre-existing subject view that the scientist has towards his/her data before seeing it (in other words, representations may not align with the true data generating process), and (3) representations are task-oriented. To build such a representation, we propose to use partition-based models. Partition-based models are easy to interpret and useful for figuring out the interactions between variables. However, the major challenge lies in the computation, since the partition numbers can grow exponentially with respect to the number of variables. To solve the problem, we need a model/representation selection method over different partition models. We proposed to use I-Score with backward dropping algorithm to achieve the goal. In this work, we explore the connection between the I-Score variable selection methodology to other existing methods and extend the idea into developing other objective functions that can be used in other applications. We apply our ideas to analyze three datasets, one is the genome-wide association study (GWAS), one is the New York City Vision Zero, and, lastly, the MNIST handwritten digit database. On these applications, we showed the potential of the interpretability of the representations can be useful in practice and provide practitioners with much more intuitions in explaining their results. Also, we showed a novel way to look at causal inference problems from the view of partition-based models. We hope this work serve as an initiative for people to start thinking about approaching problems from a different angle and to involve interpretability into the consideration when building a model so that it can be easier to be used to communicate with people from other fields.Englishhttps://doi.org/10.7916/d8-2baa-6k42
collection NDLTD
language English
sources NDLTD
topic Statistics
Inference--Data processing
Statistics--Methodology
Statistics--Models
Machine learning--Statistical methods
spellingShingle Statistics
Inference--Data processing
Statistics--Methodology
Statistics--Models
Machine learning--Statistical methods
Hsu, Yayun
Partition-based Model Representation Learning
description Modern machine learning consists of both task forces from classical statistics and modern computation. On the one hand, this field becomes rich and quick-growing; on the other hand, different convention from different schools becomes harder and harder to communicate over time. A lot of the times, the problem is not about who is absolutely right or wrong, but about from which angle that one should approach the problem. This is the moment when we feel there should be a unifying machine learning framework that can withhold different schools under the same umbrella. So we propose one of such a framework and call it ``representation learning''. Representations are for the data, which is almost identical to a statistical model. However, philosophically, we would like to distinguish from classical statistical modeling such that (1) representations are interpretable to the scientist, (2) representations convey the pre-existing subject view that the scientist has towards his/her data before seeing it (in other words, representations may not align with the true data generating process), and (3) representations are task-oriented. To build such a representation, we propose to use partition-based models. Partition-based models are easy to interpret and useful for figuring out the interactions between variables. However, the major challenge lies in the computation, since the partition numbers can grow exponentially with respect to the number of variables. To solve the problem, we need a model/representation selection method over different partition models. We proposed to use I-Score with backward dropping algorithm to achieve the goal. In this work, we explore the connection between the I-Score variable selection methodology to other existing methods and extend the idea into developing other objective functions that can be used in other applications. We apply our ideas to analyze three datasets, one is the genome-wide association study (GWAS), one is the New York City Vision Zero, and, lastly, the MNIST handwritten digit database. On these applications, we showed the potential of the interpretability of the representations can be useful in practice and provide practitioners with much more intuitions in explaining their results. Also, we showed a novel way to look at causal inference problems from the view of partition-based models. We hope this work serve as an initiative for people to start thinking about approaching problems from a different angle and to involve interpretability into the consideration when building a model so that it can be easier to be used to communicate with people from other fields.
author Hsu, Yayun
author_facet Hsu, Yayun
author_sort Hsu, Yayun
title Partition-based Model Representation Learning
title_short Partition-based Model Representation Learning
title_full Partition-based Model Representation Learning
title_fullStr Partition-based Model Representation Learning
title_full_unstemmed Partition-based Model Representation Learning
title_sort partition-based model representation learning
publishDate 2020
url https://doi.org/10.7916/d8-2baa-6k42
work_keys_str_mv AT hsuyayun partitionbasedmodelrepresentationlearning
_version_ 1719340296438808576