Dynamic Documents for Data Analytic Science

<p> The need for reproducibility in computational research has been highlighted by a number of recent failures to replicate published data analytic findings. Most efforts to ensure reproducibility involve providing guarantees that reported results can be generated from the data via the reporte...

Full description

Bibliographic Details
Main Author: Becker, Gabriel
Language:EN
Published: University of California, Davis 2015
Subjects:
Online Access:http://pqdtopen.proquest.com/#viewpdf?dispub=3685178
id ndltd-PROQUEST-oai-pqdtoai.proquest.com-3685178
record_format oai_dc
spelling ndltd-PROQUEST-oai-pqdtoai.proquest.com-36851782015-03-27T04:17:03Z Dynamic Documents for Data Analytic Science Becker, Gabriel Statistics <p> The need for reproducibility in computational research has been highlighted by a number of recent failures to replicate published data analytic findings. Most efforts to ensure reproducibility involve providing guarantees that reported results can be generated from the data via the reported methods, with a popular avenue being dynamic documents. This insurance is necessary but not sufficient for full validation, as inappropriately chosen methods will simply reproduce questionable results. To fully verify computational research we must replicate analysts' research processes, including: choice of and response to exploratory or intermediate results, identification of potential analysis strategies and statistical methods, selection of a single strategy from among those considered, and finally, the generation of reported results using the chosen method. </p><p> We present the concept of comprehensive dynamic documents. These documents represent the full breadth of an analyst's work during computational research, including code and text describing: intermediate and exploratory computations, alternate methods, and even ideas the analyst had which were not fully pursued. Furthermore, additional information can be embedded in the documents such as data provenance, experimental design, or details of the computing system on which the work was originally performed. We also propose computational models for representing, processing, and programmatically operating on such documents within R. </p><p> These comprehensive documents act as databases, encompassing both the work that the analyst has performed and the relationships among specific pieces of that work. This allows us to investigate research in a number of ways difficult or impossible to achieve given only a description of the final strategy. We can explore the choice of methods and whether due diligence was performed during an analysis. Secondly, we can compare alternative strategies either side-by-side or interactively. Finally, we can treat these complex documents as data about the research process and analyze them programmatically. </p><p> We also present a proof-of-concept set of software tools for working with comprehensive dynamic documents. This includes an R package which implements a framework for comprehensive documents in R, an extension of the IPython Notebook platform which allows users to author and interactively view them, and a caching mechanism which provides the efficiency necessary for interactive, self-updating views of such documents.</p> University of California, Davis 2015-03-26 00:00:00.0 thesis http://pqdtopen.proquest.com/#viewpdf?dispub=3685178 EN
collection NDLTD
language EN
sources NDLTD
topic Statistics
spellingShingle Statistics
Becker, Gabriel
Dynamic Documents for Data Analytic Science
description <p> The need for reproducibility in computational research has been highlighted by a number of recent failures to replicate published data analytic findings. Most efforts to ensure reproducibility involve providing guarantees that reported results can be generated from the data via the reported methods, with a popular avenue being dynamic documents. This insurance is necessary but not sufficient for full validation, as inappropriately chosen methods will simply reproduce questionable results. To fully verify computational research we must replicate analysts' research processes, including: choice of and response to exploratory or intermediate results, identification of potential analysis strategies and statistical methods, selection of a single strategy from among those considered, and finally, the generation of reported results using the chosen method. </p><p> We present the concept of comprehensive dynamic documents. These documents represent the full breadth of an analyst's work during computational research, including code and text describing: intermediate and exploratory computations, alternate methods, and even ideas the analyst had which were not fully pursued. Furthermore, additional information can be embedded in the documents such as data provenance, experimental design, or details of the computing system on which the work was originally performed. We also propose computational models for representing, processing, and programmatically operating on such documents within R. </p><p> These comprehensive documents act as databases, encompassing both the work that the analyst has performed and the relationships among specific pieces of that work. This allows us to investigate research in a number of ways difficult or impossible to achieve given only a description of the final strategy. We can explore the choice of methods and whether due diligence was performed during an analysis. Secondly, we can compare alternative strategies either side-by-side or interactively. Finally, we can treat these complex documents as data about the research process and analyze them programmatically. </p><p> We also present a proof-of-concept set of software tools for working with comprehensive dynamic documents. This includes an R package which implements a framework for comprehensive documents in R, an extension of the IPython Notebook platform which allows users to author and interactively view them, and a caching mechanism which provides the efficiency necessary for interactive, self-updating views of such documents.</p>
author Becker, Gabriel
author_facet Becker, Gabriel
author_sort Becker, Gabriel
title Dynamic Documents for Data Analytic Science
title_short Dynamic Documents for Data Analytic Science
title_full Dynamic Documents for Data Analytic Science
title_fullStr Dynamic Documents for Data Analytic Science
title_full_unstemmed Dynamic Documents for Data Analytic Science
title_sort dynamic documents for data analytic science
publisher University of California, Davis
publishDate 2015
url http://pqdtopen.proquest.com/#viewpdf?dispub=3685178
work_keys_str_mv AT beckergabriel dynamicdocumentsfordataanalyticscience
_version_ 1716797521996546048