Using a Data Warehouse as Part of a General Business Process Data Analysis System

Data analytics queries often involve aggregating over massive amounts of data, in order to detect trends in the data, make predictions about future data, and make business decisions as a result. As such, it is important that a database management system (DBMS) handling data analytics queries perform...

Full description

Bibliographic Details
Main Author: Maor, Amit
Format: Others
Published: Scholarship @ Claremont 2016
Subjects:
Online Access:http://scholarship.claremont.edu/cmc_theses/1383
http://scholarship.claremont.edu/cgi/viewcontent.cgi?article=2418&context=cmc_theses
Description
Summary:Data analytics queries often involve aggregating over massive amounts of data, in order to detect trends in the data, make predictions about future data, and make business decisions as a result. As such, it is important that a database management system (DBMS) handling data analytics queries perform well when those queries involve massive amounts of data. A data warehouse is a DBMS which is designed specifically to handle data analytics queries. This thesis describes the data warehouse Amazon Redshift, and how it was used to design a data analysis system for Laserfiche. Laserfiche is a software company that provides each of their clients a system to store and process business process data. Through the 2015-16 Harvey Mudd College Clinic project, the Clinic team built a data analysis system that provides Laserfiche clients with near real-time reports containing analyses of their business process data. This thesis discusses the advantages of Redshift’s data model and physical storage layout, as well as Redshift’s features directly benefit of the data analysis system.