Data science and advanced analytics : an integrated framework for creating value from data

Thesis: Ph. D. in Data Science, Massachusetts Institute of Technology, Department of Urban Studies and Planning, 2018. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 78-83). === Fundamental problems in society, such as medical decision support, urban plannin...

Full description

Bibliographic Details
Main Author: Paredes, Miguel (Miguel Andres)
Other Authors: Una-May O'Reilly and Roy Welsch.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2019
Subjects:
Online Access:http://hdl.handle.net/1721.1/120232
Description
Summary:Thesis: Ph. D. in Data Science, Massachusetts Institute of Technology, Department of Urban Studies and Planning, 2018. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 78-83). === Fundamental problems in society, such as medical decision support, urban planning and customer management, can be addressed by data-driven modeling. Frequently, the only data available are observational rather than experimental. This precludes causal inference, though it supports quasi-causal inference (or causal approximation) and prediction. With three different studies that are driven by observational data, this thesis compares machine learning and econometric modeling in terms of their purposes, insights, and uses. It proposes a data science methodology that combines both types of modeling to enable experimental designs which would otherwise be impossible to carry out. In the first two studies, we address problems through both a prediction and quasi-causation approach (i.e. machine learning and econometrics), exploring their similarities, differences, benefits, and limitations. These two initial studies serve to demonstrate the need for an end-to-end methodology that combines prediction and causation. Our proposed data science methodology is presented in the third study, in which an enterprise seeks to address its customer churn. First, it uses observational data and econometrics to approximate the causal determinants of churn (quasi-causal insights). Second, it uses machine learning to predict churn likelihoods of clients, and selects a study group with likelihoods above a threshold of interest. Third, the quasi-causal insights are used to design a stratified randomized controlled trial (i.e. A/B test) where study subjects are randomly assigned to one of three experimental groups. Finally, thanks to the rigorously designed experiment, the causal effects of the interventions are determined, and the cost-effectiveness of the treatments relative to the control group are established. === by Miguel Paredes. === Ph. D. in Data Science