Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 173-180). === The revolution of "Big Data" has reached various fields like market...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | English |
Published: |
Massachusetts Institute of Technology
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/107357 |
id |
ndltd-MIT-oai-dspace.mit.edu-1721.1-107357 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-MIT-oai-dspace.mit.edu-1721.1-1073572019-05-02T16:26:22Z Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine New machine learning models with applications in computational criminology, marketing, and medicine Wang, Tong, Ph. D. Massachusetts Institute of Technology Cynthia Rudin. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. Cataloged from PDF version of thesis. Includes bibliographical references (pages 173-180). The revolution of "Big Data" has reached various fields like marketing, healthcare, and criminology, where domain experts wish to find and understand interesting patterns from data. This thesis studies patterns that are defined by subsets of observations or subsets of features. The first part of the thesis studies patterns defined by subsets of observations. We look at a specific type of pattern, crime series (a set of crimes committed by the same individual or group) and develop two pattern detection algorithms. The first method is a sequential pattern building algorithm called Series Finder, which resembles how crime analysts process information instinctively and grows a crime series starting from a couple of seed crimes. The second method is a subspace clustering with cluster-specific feature selection, which is supervised when learning similarity graphs in order to reduce computation. Both methods we propose achieved promising results on a decade's worth of crime pattern data collected by the Crime Analysis Unit of the Cambridge Police Department. The second part of the thesis studies patterns defined by subsets of features. We develop methods and theory for building Rule Set models with the hallmark of interpretability. Interpretability is inherent in using association rules to explain predicted results. We first design two methods for building rule sets for binary classification. The first method Bayesian Rule Set (BRS) uses a Bayesian framework with priors that favor small models. The Bayesian priors also bring significant computational benefits to MAP inferences by reducing the search space and restraining the sampling chain within appropriate regions. We apply BRS models to an in-vehicle recommender system data set we collected via Amazon Mechanical Turk to study the customers and contexts that would encourage acceptance of coupons. We develop another model Optimized Rule Set (ORS) using optimization methods to directly construct rule sets from data, without pre-mining rules or discretizing continuous attributes. As a main application of ORS, we build a diagnostic screening tool for obstructive sleep apnea trained on data provided by the Sleep Lab at Mass General Hospital. Our models achieve high accuracy with a substantial gain in interpretability over other methods. Lastly, we build a Causal Rule Set (CRS) model for causal analysis, to identify subgroups that can benefit from a treatment. CRS combines BRS and Bayesian Logistic Regression. We take advantage of different strategies in inference algorithm to speed up computation. Simulations and experiments show that distributing treatment according to CRS models enhances average treatment effect. by Tong Wang. Ph. D. 2017-03-10T15:06:54Z 2017-03-10T15:06:54Z 2016 2016 Thesis http://hdl.handle.net/1721.1/107357 973332694 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 180 pages application/pdf Massachusetts Institute of Technology |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Electrical Engineering and Computer Science. |
spellingShingle |
Electrical Engineering and Computer Science. Wang, Tong, Ph. D. Massachusetts Institute of Technology Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine |
description |
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 173-180). === The revolution of "Big Data" has reached various fields like marketing, healthcare, and criminology, where domain experts wish to find and understand interesting patterns from data. This thesis studies patterns that are defined by subsets of observations or subsets of features. The first part of the thesis studies patterns defined by subsets of observations. We look at a specific type of pattern, crime series (a set of crimes committed by the same individual or group) and develop two pattern detection algorithms. The first method is a sequential pattern building algorithm called Series Finder, which resembles how crime analysts process information instinctively and grows a crime series starting from a couple of seed crimes. The second method is a subspace clustering with cluster-specific feature selection, which is supervised when learning similarity graphs in order to reduce computation. Both methods we propose achieved promising results on a decade's worth of crime pattern data collected by the Crime Analysis Unit of the Cambridge Police Department. The second part of the thesis studies patterns defined by subsets of features. We develop methods and theory for building Rule Set models with the hallmark of interpretability. Interpretability is inherent in using association rules to explain predicted results. We first design two methods for building rule sets for binary classification. The first method Bayesian Rule Set (BRS) uses a Bayesian framework with priors that favor small models. The Bayesian priors also bring significant computational benefits to MAP inferences by reducing the search space and restraining the sampling chain within appropriate regions. We apply BRS models to an in-vehicle recommender system data set we collected via Amazon Mechanical Turk to study the customers and contexts that would encourage acceptance of coupons. We develop another model Optimized Rule Set (ORS) using optimization methods to directly construct rule sets from data, without pre-mining rules or discretizing continuous attributes. As a main application of ORS, we build a diagnostic screening tool for obstructive sleep apnea trained on data provided by the Sleep Lab at Mass General Hospital. Our models achieve high accuracy with a substantial gain in interpretability over other methods. Lastly, we build a Causal Rule Set (CRS) model for causal analysis, to identify subgroups that can benefit from a treatment. CRS combines BRS and Bayesian Logistic Regression. We take advantage of different strategies in inference algorithm to speed up computation. Simulations and experiments show that distributing treatment according to CRS models enhances average treatment effect. === by Tong Wang. === Ph. D. |
author2 |
Cynthia Rudin. |
author_facet |
Cynthia Rudin. Wang, Tong, Ph. D. Massachusetts Institute of Technology |
author |
Wang, Tong, Ph. D. Massachusetts Institute of Technology |
author_sort |
Wang, Tong, Ph. D. Massachusetts Institute of Technology |
title |
Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine |
title_short |
Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine |
title_full |
Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine |
title_fullStr |
Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine |
title_full_unstemmed |
Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine |
title_sort |
finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine |
publisher |
Massachusetts Institute of Technology |
publishDate |
2017 |
url |
http://hdl.handle.net/1721.1/107357 |
work_keys_str_mv |
AT wangtongphdmassachusettsinstituteoftechnology findingpatternsinfeaturesandobservationsnewmachinelearningmodelswithapplicationsincomputationalcriminologymarketingandmedicine AT wangtongphdmassachusettsinstituteoftechnology newmachinelearningmodelswithapplicationsincomputationalcriminologymarketingandmedicine |
_version_ |
1719040181833564160 |