Spark for Social Science

Urban has developed an elastic and powerful approach to the analysis of massive datasets using Amazon Web Services’ Elastic MapReduce (EMR) and the Spark framework for distributed memory and processing. The goal of the project is to deliver powerful and elastic Spark clusters to researchers and dat...

Full description

Bibliographic Details
Main Authors: Graham MacDonald, Alex Engler, Jeffrey Levy, Sarah Armstrong
Format: Article
Language:English
Published: Swansea University 2018-10-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/1044
id doaj-2e9c3339ddc0494ea156c5422951e14b
record_format Article
spelling doaj-2e9c3339ddc0494ea156c5422951e14b2020-11-25T02:08:50ZengSwansea UniversityInternational Journal of Population Data Science2399-49082018-10-013510.23889/ijpds.v3i5.1044Spark for Social ScienceGraham MacDonald0Alex Engler1Jeffrey Levy2Sarah Armstrong3Urban InstituteUniversity of ChicagoUrban InstituteUniversity of Chicago Urban has developed an elastic and powerful approach to the analysis of massive datasets using Amazon Web Services’ Elastic MapReduce (EMR) and the Spark framework for distributed memory and processing. The goal of the project is to deliver powerful and elastic Spark clusters to researchers and data analysts with as little setup time and effort possible, and at low cost. To do that, at the Urban Institute, we use two critical components: (1) an Amazon Web Services (AWS) CloudFormation script to launch AWS Elastic MapReduce (EMR) clusters (2) a bootstrap script that runs on the Master node of the new cluster to install statistical programs and development environments (RStudio and Jupyter Notebooks). The Urban Institute’s Spark for Social Science Github page holds code used to setup the cluster and tutorials for learning how to program in R and Python. https://ijpds.org/article/view/1044
collection DOAJ
language English
format Article
sources DOAJ
author Graham MacDonald
Alex Engler
Jeffrey Levy
Sarah Armstrong
spellingShingle Graham MacDonald
Alex Engler
Jeffrey Levy
Sarah Armstrong
Spark for Social Science
International Journal of Population Data Science
author_facet Graham MacDonald
Alex Engler
Jeffrey Levy
Sarah Armstrong
author_sort Graham MacDonald
title Spark for Social Science
title_short Spark for Social Science
title_full Spark for Social Science
title_fullStr Spark for Social Science
title_full_unstemmed Spark for Social Science
title_sort spark for social science
publisher Swansea University
series International Journal of Population Data Science
issn 2399-4908
publishDate 2018-10-01
description Urban has developed an elastic and powerful approach to the analysis of massive datasets using Amazon Web Services’ Elastic MapReduce (EMR) and the Spark framework for distributed memory and processing. The goal of the project is to deliver powerful and elastic Spark clusters to researchers and data analysts with as little setup time and effort possible, and at low cost. To do that, at the Urban Institute, we use two critical components: (1) an Amazon Web Services (AWS) CloudFormation script to launch AWS Elastic MapReduce (EMR) clusters (2) a bootstrap script that runs on the Master node of the new cluster to install statistical programs and development environments (RStudio and Jupyter Notebooks). The Urban Institute’s Spark for Social Science Github page holds code used to setup the cluster and tutorials for learning how to program in R and Python.
url https://ijpds.org/article/view/1044
work_keys_str_mv AT grahammacdonald sparkforsocialscience
AT alexengler sparkforsocialscience
AT jeffreylevy sparkforsocialscience
AT saraharmstrong sparkforsocialscience
_version_ 1724925075188088832