Large Science Databases – Are Cloud Services Ready for Them?

We report on attempts to put an astronomical database – the Sloan Digital Sky Survey science archive – in the cloud. We find that it is very frustrating to impossible at this time to migrate a complex SQL Server database into current cloud service offerings such as Amazon (EC2) and Microsoft (SQL Az...

Full description

Bibliographic Details
Main Authors: Ani Thakar, Alex Szalay, Ken Church, Andreas Terzis
Format: Article
Language:English
Published: Hindawi Limited 2011-01-01
Series:Scientific Programming
Online Access:http://dx.doi.org/10.3233/SPR-2011-0325
id doaj-4cd7466534bb4fa4ae7c5dbacd35c939
record_format Article
spelling doaj-4cd7466534bb4fa4ae7c5dbacd35c9392021-07-02T09:16:28ZengHindawi LimitedScientific Programming1058-92441875-919X2011-01-01192-314715910.3233/SPR-2011-0325Large Science Databases – Are Cloud Services Ready for Them?Ani Thakar0Alex Szalay1Ken Church2Andreas Terzis3Department of Physics and Astronomy and the Institute for Data Intensive Engineering and Science, The Johns Hopkins University, Baltimore, MD, USADepartment of Physics and Astronomy and the Institute for Data Intensive Engineering and Science, The Johns Hopkins University, Baltimore, MD, USAHuman Language Technology Center of Excellence and IDIES, The Johns Hopkins University, Baltimore, MD, USADepartment of Computer Science and IDIES, The Johns Hopkins University, Baltimore, MD, USAWe report on attempts to put an astronomical database – the Sloan Digital Sky Survey science archive – in the cloud. We find that it is very frustrating to impossible at this time to migrate a complex SQL Server database into current cloud service offerings such as Amazon (EC2) and Microsoft (SQL Azure). Certainly it is impossible to migrate a large database in excess of a TB, but even with (much) smaller databases, the limitations of cloud services make it very difficult to migrate the data to the cloud without making changes to the schema and settings that would degrade performance and/or make the data unusable. Preliminary performance comparisons show a large performance discrepancy with the Amazon cloud version of the SDSS database. These difficulties suggest that much work and coordination needs to occur between cloud service providers and their potential clients before science databases – not just large ones but even smaller databases that make extensive use of advanced database features for performance and usability – can successfully and effectively be deployed in the cloud. We describe a powerful new computational instrument that we are developing in the interim – the Data-Scope – that will enable fast and efficient analysis of the largest (petabyte scale) scientific datasets.http://dx.doi.org/10.3233/SPR-2011-0325
collection DOAJ
language English
format Article
sources DOAJ
author Ani Thakar
Alex Szalay
Ken Church
Andreas Terzis
spellingShingle Ani Thakar
Alex Szalay
Ken Church
Andreas Terzis
Large Science Databases – Are Cloud Services Ready for Them?
Scientific Programming
author_facet Ani Thakar
Alex Szalay
Ken Church
Andreas Terzis
author_sort Ani Thakar
title Large Science Databases – Are Cloud Services Ready for Them?
title_short Large Science Databases – Are Cloud Services Ready for Them?
title_full Large Science Databases – Are Cloud Services Ready for Them?
title_fullStr Large Science Databases – Are Cloud Services Ready for Them?
title_full_unstemmed Large Science Databases – Are Cloud Services Ready for Them?
title_sort large science databases – are cloud services ready for them?
publisher Hindawi Limited
series Scientific Programming
issn 1058-9244
1875-919X
publishDate 2011-01-01
description We report on attempts to put an astronomical database – the Sloan Digital Sky Survey science archive – in the cloud. We find that it is very frustrating to impossible at this time to migrate a complex SQL Server database into current cloud service offerings such as Amazon (EC2) and Microsoft (SQL Azure). Certainly it is impossible to migrate a large database in excess of a TB, but even with (much) smaller databases, the limitations of cloud services make it very difficult to migrate the data to the cloud without making changes to the schema and settings that would degrade performance and/or make the data unusable. Preliminary performance comparisons show a large performance discrepancy with the Amazon cloud version of the SDSS database. These difficulties suggest that much work and coordination needs to occur between cloud service providers and their potential clients before science databases – not just large ones but even smaller databases that make extensive use of advanced database features for performance and usability – can successfully and effectively be deployed in the cloud. We describe a powerful new computational instrument that we are developing in the interim – the Data-Scope – that will enable fast and efficient analysis of the largest (petabyte scale) scientific datasets.
url http://dx.doi.org/10.3233/SPR-2011-0325
work_keys_str_mv AT anithakar largesciencedatabasesarecloudservicesreadyforthem
AT alexszalay largesciencedatabasesarecloudservicesreadyforthem
AT kenchurch largesciencedatabasesarecloudservicesreadyforthem
AT andreasterzis largesciencedatabasesarecloudservicesreadyforthem
_version_ 1721333318575718400