Truncated Bayesian nonparametrics

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2016. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 167-175). === Many datasets can be thought of as expressing a collection of underlying traits with unknown...

Full description

Bibliographic Details
Main Author:	Campbell, Trevor D. J. (Trevor David Jan)
Other Authors:	Jonathan P. How.
Format:	Others
Language:	English
Published:	Massachusetts Institute of Technology 2017
Subjects:	Aeronautics and Astronautics.
Online Access:	http://hdl.handle.net/1721.1/107047

id	ndltd-MIT-oai-dspace.mit.edu-1721.1-107047
record_format	oai_dc
spelling	ndltd-MIT-oai-dspace.mit.edu-1721.1-1070472019-05-02T16:08:47Z Truncated Bayesian nonparametrics Campbell, Trevor D. J. (Trevor David Jan) Jonathan P. How. Massachusetts Institute of Technology. Department of Aeronautics and Astronautics. Massachusetts Institute of Technology. Department of Aeronautics and Astronautics. Aeronautics and Astronautics. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2016. Cataloged from PDF version of thesis. Includes bibliographical references (pages 167-175). Many datasets can be thought of as expressing a collection of underlying traits with unknown cardinality. Moreover, these datasets are often persistently growing, and we expect the number of expressed traits to likewise increase over time. Priors from Bayesian nonparametrics are well-suited to this modeling challenge: they generate a countably infinite number of underlying traits, which allows the number of expressed traits to both be random and to grow with the dataset size. We also require corresponding streaming, distributed inference algorithms that handle persistently growing datasets without slowing down over time. However, a key ingredient in streaming, distributed inference-an explicit representation of the latent variables used to statistically decouple the data-is not available for nonparametric priors, as we cannot simulate or store infinitely many random variables in practice. One approach is to approximate the nonparametric prior by developing a sequential representation-such that the traits are generated by a sequence of finite-dimensional distributions-and subsequently truncating it at some finite level, thus allowing explicit representation. However, truncated sequential representations have been developed only for a small number of priors in Bayesian nonparametrics, and the order they impose on the traits creates identifiability issues in the streaming, distributed setting. This thesis provides a comprehensive theoretical treatment of sequential representations and truncation in Bayesian nonparametrics. It details three sequential representations of a large class of nonparametric priors, and analyzes their truncation error and computational complexity. The results generalize and improve upon those existing in the literature. Next, the truncated explicit representations are used to develop the first streaming, distributed, asynchronous inference procedures for models from Bayesian nonparametrics. The combinatorial issues associated with trait identifiability in such models are resolved via a novel matching optimization. The resulting algorithms are fast, learning rate-free, and truncation-free. Taken together, these contributions provide the practitioner with the means to (1) develop multiple finite approximations for a given nonparametric prior; (2) determine which is the best for their application; and (3) use that approximation in the development of efficient streaming, distributed, asynchronous inference algorithms. by Trevor David Jan Campbell. Ph. D. 2017-02-22T19:01:02Z 2017-02-22T19:01:02Z 2016 2016 Thesis http://hdl.handle.net/1721.1/107047 971020152 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 175 pages application/pdf Massachusetts Institute of Technology
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Aeronautics and Astronautics.
spellingShingle	Aeronautics and Astronautics. Campbell, Trevor D. J. (Trevor David Jan) Truncated Bayesian nonparametrics
description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2016. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 167-175). === Many datasets can be thought of as expressing a collection of underlying traits with unknown cardinality. Moreover, these datasets are often persistently growing, and we expect the number of expressed traits to likewise increase over time. Priors from Bayesian nonparametrics are well-suited to this modeling challenge: they generate a countably infinite number of underlying traits, which allows the number of expressed traits to both be random and to grow with the dataset size. We also require corresponding streaming, distributed inference algorithms that handle persistently growing datasets without slowing down over time. However, a key ingredient in streaming, distributed inference-an explicit representation of the latent variables used to statistically decouple the data-is not available for nonparametric priors, as we cannot simulate or store infinitely many random variables in practice. One approach is to approximate the nonparametric prior by developing a sequential representation-such that the traits are generated by a sequence of finite-dimensional distributions-and subsequently truncating it at some finite level, thus allowing explicit representation. However, truncated sequential representations have been developed only for a small number of priors in Bayesian nonparametrics, and the order they impose on the traits creates identifiability issues in the streaming, distributed setting. This thesis provides a comprehensive theoretical treatment of sequential representations and truncation in Bayesian nonparametrics. It details three sequential representations of a large class of nonparametric priors, and analyzes their truncation error and computational complexity. The results generalize and improve upon those existing in the literature. Next, the truncated explicit representations are used to develop the first streaming, distributed, asynchronous inference procedures for models from Bayesian nonparametrics. The combinatorial issues associated with trait identifiability in such models are resolved via a novel matching optimization. The resulting algorithms are fast, learning rate-free, and truncation-free. Taken together, these contributions provide the practitioner with the means to (1) develop multiple finite approximations for a given nonparametric prior; (2) determine which is the best for their application; and (3) use that approximation in the development of efficient streaming, distributed, asynchronous inference algorithms. === by Trevor David Jan Campbell. === Ph. D.
author2	Jonathan P. How.
author_facet	Jonathan P. How. Campbell, Trevor D. J. (Trevor David Jan)
author	Campbell, Trevor D. J. (Trevor David Jan)
author_sort	Campbell, Trevor D. J. (Trevor David Jan)
title	Truncated Bayesian nonparametrics
title_short	Truncated Bayesian nonparametrics
title_full	Truncated Bayesian nonparametrics
title_fullStr	Truncated Bayesian nonparametrics
title_full_unstemmed	Truncated Bayesian nonparametrics
title_sort	truncated bayesian nonparametrics
publisher	Massachusetts Institute of Technology
publishDate	2017
url	http://hdl.handle.net/1721.1/107047
work_keys_str_mv	AT campbelltrevordjtrevordavidjan truncatedbayesiannonparametrics
_version_	1719035172273258496

Truncated Bayesian nonparametrics

Similar Items