Summary: | Modern machine learning consists of both task forces from classical statistics and modern computation. On the one hand, this field becomes rich and quick-growing; on the other hand, different convention from different schools becomes harder and harder to communicate over time. A lot of the times, the problem is not about who is absolutely right or wrong, but about from which angle that one should approach the problem. This is the moment when we feel there should be a unifying machine learning framework that can withhold different schools under the same umbrella. So we propose one of such a framework and call it ``representation learning''.
Representations are for the data, which is almost identical to a statistical model. However, philosophically, we would like to distinguish from classical statistical modeling such that (1) representations are interpretable to the scientist, (2) representations convey the pre-existing subject view that the scientist has towards his/her data before seeing it (in other words, representations may not align with the true data generating process), and (3) representations are task-oriented.
To build such a representation, we propose to use partition-based models. Partition-based models are easy to interpret and useful for figuring out the interactions between variables. However, the major challenge lies in the computation, since the partition numbers can grow exponentially with respect to the number of variables. To solve the problem, we need a model/representation selection method over different partition models. We proposed to use I-Score with backward dropping algorithm to achieve the goal.
In this work, we explore the connection between the I-Score variable selection methodology to other existing methods and extend the idea into developing other objective functions that can be used in other applications. We apply our ideas to analyze three datasets, one is the genome-wide association study (GWAS), one is the New York City Vision Zero, and, lastly, the MNIST handwritten digit database.
On these applications, we showed the potential of the interpretability of the representations can be useful in practice and provide practitioners with much more intuitions in explaining their results. Also, we showed a novel way to look at causal inference problems from the view of partition-based models.
We hope this work serve as an initiative for people to start thinking about approaching problems from a different angle and to involve interpretability into the consideration when building a model so that it can be easier to be used to communicate with people from other fields.
|