Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 173-180). === The revolution of "Big Data" has reached various fields like market...

Full description

Bibliographic Details
Main Author: Wang, Tong, Ph. D. Massachusetts Institute of Technology
Other Authors: Cynthia Rudin.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2017
Subjects:
Online Access:http://hdl.handle.net/1721.1/107357
id ndltd-MIT-oai-dspace.mit.edu-1721.1-107357
record_format oai_dc
spelling ndltd-MIT-oai-dspace.mit.edu-1721.1-1073572019-05-02T16:26:22Z Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine New machine learning models with applications in computational criminology, marketing, and medicine Wang, Tong, Ph. D. Massachusetts Institute of Technology Cynthia Rudin. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. Cataloged from PDF version of thesis. Includes bibliographical references (pages 173-180). The revolution of "Big Data" has reached various fields like marketing, healthcare, and criminology, where domain experts wish to find and understand interesting patterns from data. This thesis studies patterns that are defined by subsets of observations or subsets of features. The first part of the thesis studies patterns defined by subsets of observations. We look at a specific type of pattern, crime series (a set of crimes committed by the same individual or group) and develop two pattern detection algorithms. The first method is a sequential pattern building algorithm called Series Finder, which resembles how crime analysts process information instinctively and grows a crime series starting from a couple of seed crimes. The second method is a subspace clustering with cluster-specific feature selection, which is supervised when learning similarity graphs in order to reduce computation. Both methods we propose achieved promising results on a decade's worth of crime pattern data collected by the Crime Analysis Unit of the Cambridge Police Department. The second part of the thesis studies patterns defined by subsets of features. We develop methods and theory for building Rule Set models with the hallmark of interpretability. Interpretability is inherent in using association rules to explain predicted results. We first design two methods for building rule sets for binary classification. The first method Bayesian Rule Set (BRS) uses a Bayesian framework with priors that favor small models. The Bayesian priors also bring significant computational benefits to MAP inferences by reducing the search space and restraining the sampling chain within appropriate regions. We apply BRS models to an in-vehicle recommender system data set we collected via Amazon Mechanical Turk to study the customers and contexts that would encourage acceptance of coupons. We develop another model Optimized Rule Set (ORS) using optimization methods to directly construct rule sets from data, without pre-mining rules or discretizing continuous attributes. As a main application of ORS, we build a diagnostic screening tool for obstructive sleep apnea trained on data provided by the Sleep Lab at Mass General Hospital. Our models achieve high accuracy with a substantial gain in interpretability over other methods. Lastly, we build a Causal Rule Set (CRS) model for causal analysis, to identify subgroups that can benefit from a treatment. CRS combines BRS and Bayesian Logistic Regression. We take advantage of different strategies in inference algorithm to speed up computation. Simulations and experiments show that distributing treatment according to CRS models enhances average treatment effect. by Tong Wang. Ph. D. 2017-03-10T15:06:54Z 2017-03-10T15:06:54Z 2016 2016 Thesis http://hdl.handle.net/1721.1/107357 973332694 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 180 pages application/pdf Massachusetts Institute of Technology
collection NDLTD
language English
format Others
sources NDLTD
topic Electrical Engineering and Computer Science.
spellingShingle Electrical Engineering and Computer Science.
Wang, Tong, Ph. D. Massachusetts Institute of Technology
Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine
description Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 173-180). === The revolution of "Big Data" has reached various fields like marketing, healthcare, and criminology, where domain experts wish to find and understand interesting patterns from data. This thesis studies patterns that are defined by subsets of observations or subsets of features. The first part of the thesis studies patterns defined by subsets of observations. We look at a specific type of pattern, crime series (a set of crimes committed by the same individual or group) and develop two pattern detection algorithms. The first method is a sequential pattern building algorithm called Series Finder, which resembles how crime analysts process information instinctively and grows a crime series starting from a couple of seed crimes. The second method is a subspace clustering with cluster-specific feature selection, which is supervised when learning similarity graphs in order to reduce computation. Both methods we propose achieved promising results on a decade's worth of crime pattern data collected by the Crime Analysis Unit of the Cambridge Police Department. The second part of the thesis studies patterns defined by subsets of features. We develop methods and theory for building Rule Set models with the hallmark of interpretability. Interpretability is inherent in using association rules to explain predicted results. We first design two methods for building rule sets for binary classification. The first method Bayesian Rule Set (BRS) uses a Bayesian framework with priors that favor small models. The Bayesian priors also bring significant computational benefits to MAP inferences by reducing the search space and restraining the sampling chain within appropriate regions. We apply BRS models to an in-vehicle recommender system data set we collected via Amazon Mechanical Turk to study the customers and contexts that would encourage acceptance of coupons. We develop another model Optimized Rule Set (ORS) using optimization methods to directly construct rule sets from data, without pre-mining rules or discretizing continuous attributes. As a main application of ORS, we build a diagnostic screening tool for obstructive sleep apnea trained on data provided by the Sleep Lab at Mass General Hospital. Our models achieve high accuracy with a substantial gain in interpretability over other methods. Lastly, we build a Causal Rule Set (CRS) model for causal analysis, to identify subgroups that can benefit from a treatment. CRS combines BRS and Bayesian Logistic Regression. We take advantage of different strategies in inference algorithm to speed up computation. Simulations and experiments show that distributing treatment according to CRS models enhances average treatment effect. === by Tong Wang. === Ph. D.
author2 Cynthia Rudin.
author_facet Cynthia Rudin.
Wang, Tong, Ph. D. Massachusetts Institute of Technology
author Wang, Tong, Ph. D. Massachusetts Institute of Technology
author_sort Wang, Tong, Ph. D. Massachusetts Institute of Technology
title Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine
title_short Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine
title_full Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine
title_fullStr Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine
title_full_unstemmed Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine
title_sort finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine
publisher Massachusetts Institute of Technology
publishDate 2017
url http://hdl.handle.net/1721.1/107357
work_keys_str_mv AT wangtongphdmassachusettsinstituteoftechnology findingpatternsinfeaturesandobservationsnewmachinelearningmodelswithapplicationsincomputationalcriminologymarketingandmedicine
AT wangtongphdmassachusettsinstituteoftechnology newmachinelearningmodelswithapplicationsincomputationalcriminologymarketingandmedicine
_version_ 1719040181833564160