Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 173-180). === The revolution of "Big Data" has reached various fields like market...

Full description

Bibliographic Details
Main Author:	Wang, Tong, Ph. D. Massachusetts Institute of Technology
Other Authors:	Cynthia Rudin.
Format:	Others
Language:	English
Published:	Massachusetts Institute of Technology 2017
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/107357

id	ndltd-MIT-oai-dspace.mit.edu-1721.1-107357
record_format	oai_dc
spelling	ndltd-MIT-oai-dspace.mit.edu-1721.1-1073572019-05-02T16:26:22Z Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine New machine learning models with applications in computational criminology, marketing, and medicine Wang, Tong, Ph. D. Massachusetts Institute of Technology Cynthia Rudin. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. Cataloged from PDF version of thesis. Includes bibliographical references (pages 173-180). The revolution of "Big Data" has reached various fields like marketing, healthcare, and criminology, where domain experts wish to find and understand interesting patterns from data. This thesis studies patterns that are defined by subsets of observations or subsets of features. The first part of the thesis studies patterns defined by subsets of observations. We look at a specific type of pattern, crime series (a set of crimes committed by the same individual or group) and develop two pattern detection algorithms. The first method is a sequential pattern building algorithm called Series Finder, which resembles how crime analysts process information instinctively and grows a crime series starting from a couple of seed crimes. The second method is a subspace clustering with cluster-specific feature selection, which is supervised when learning similarity graphs in order to reduce computation. Both methods we propose achieved promising results on a decade's worth of crime pattern data collected by the Crime Analysis Unit of the Cambridge Police Department. The second part of the thesis studies patterns defined by subsets of features. We develop methods and theory for building Rule Set models with the hallmark of interpretability. Interpretability is inherent in using association rules to explain predicted results. We first design two methods for building rule sets for binary classification. The first method Bayesian Rule Set (BRS) uses a Bayesian framework with priors that favor small models. The Bayesian priors also bring significant computational benefits to MAP inferences by reducing the search space and restraining the sampling chain within appropriate regions. We apply BRS models to an in-vehicle recommender system data set we collected via Amazon Mechanical Turk to study the customers and contexts that would encourage acceptance of coupons. We develop another model Optimized Rule Set (ORS) using optimization methods to directly construct rule sets from data, without pre-mining rules or discretizing continuous attributes. As a main application of ORS, we build a diagnostic screening tool for obstructive sleep apnea trained on data provided by the Sleep Lab at Mass General Hospital. Our models achieve high accuracy with a substantial gain in interpretability over other methods. Lastly, we build a Causal Rule Set (CRS) model for causal analysis, to identify subgroups that can benefit from a treatment. CRS combines BRS and Bayesian Logistic Regression. We take advantage of different strategies in inference algorithm to speed up computation. Simulations and experiments show that distributing treatment according to CRS models enhances average treatment effect. by Tong Wang. Ph. D. 2017-03-10T15:06:54Z 2017-03-10T15:06:54Z 2016 2016 Thesis http://hdl.handle.net/1721.1/107357 973332694 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 180 pages application/pdf Massachusetts Institute of Technology
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Electrical Engineering and Computer Science.
spellingShingle	Electrical Engineering and Computer Science. Wang, Tong, Ph. D. Massachusetts Institute of Technology Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine
description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 173-180). === The revolution of "Big Data" has reached various fields like marketing, healthcare, and criminology, where domain experts wish to find and understand interesting patterns from data. This thesis studies patterns that are defined by subsets of observations or subsets of features. The first part of the thesis studies patterns defined by subsets of observations. We look at a specific type of pattern, crime series (a set of crimes committed by the same individual or group) and develop two pattern detection algorithms. The first method is a sequential pattern building algorithm called Series Finder, which resembles how crime analysts process information instinctively and grows a crime series starting from a couple of seed crimes. The second method is a subspace clustering with cluster-specific feature selection, which is supervised when learning similarity graphs in order to reduce computation. Both methods we propose achieved promising results on a decade's worth of crime pattern data collected by the Crime Analysis Unit of the Cambridge Police Department. The second part of the thesis studies patterns defined by subsets of features. We develop methods and theory for building Rule Set models with the hallmark of interpretability. Interpretability is inherent in using association rules to explain predicted results. We first design two methods for building rule sets for binary classification. The first method Bayesian Rule Set (BRS) uses a Bayesian framework with priors that favor small models. The Bayesian priors also bring significant computational benefits to MAP inferences by reducing the search space and restraining the sampling chain within appropriate regions. We apply BRS models to an in-vehicle recommender system data set we collected via Amazon Mechanical Turk to study the customers and contexts that would encourage acceptance of coupons. We develop another model Optimized Rule Set (ORS) using optimization methods to directly construct rule sets from data, without pre-mining rules or discretizing continuous attributes. As a main application of ORS, we build a diagnostic screening tool for obstructive sleep apnea trained on data provided by the Sleep Lab at Mass General Hospital. Our models achieve high accuracy with a substantial gain in interpretability over other methods. Lastly, we build a Causal Rule Set (CRS) model for causal analysis, to identify subgroups that can benefit from a treatment. CRS combines BRS and Bayesian Logistic Regression. We take advantage of different strategies in inference algorithm to speed up computation. Simulations and experiments show that distributing treatment according to CRS models enhances average treatment effect. === by Tong Wang. === Ph. D.
author2	Cynthia Rudin.
author_facet	Cynthia Rudin. Wang, Tong, Ph. D. Massachusetts Institute of Technology
author	Wang, Tong, Ph. D. Massachusetts Institute of Technology
author_sort	Wang, Tong, Ph. D. Massachusetts Institute of Technology
title	Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine
title_short	Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine
title_full	Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine
title_fullStr	Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine
title_full_unstemmed	Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine
title_sort	finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine
publisher	Massachusetts Institute of Technology
publishDate	2017
url	http://hdl.handle.net/1721.1/107357
work_keys_str_mv	AT wangtongphdmassachusettsinstituteoftechnology findingpatternsinfeaturesandobservationsnewmachinelearningmodelswithapplicationsincomputationalcriminologymarketingandmedicine AT wangtongphdmassachusettsinstituteoftechnology newmachinelearningmodelswithapplicationsincomputationalcriminologymarketingandmedicine
_version_	1719040181833564160

Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine

Similar Items