Large Science Databases – Are Cloud Services Ready for Them?

We report on attempts to put an astronomical database – the Sloan Digital Sky Survey science archive – in the cloud. We find that it is very frustrating to impossible at this time to migrate a complex SQL Server database into current cloud service offerings such as Amazon (EC2) and Microsoft (SQL Az...

Full description

Bibliographic Details
Main Authors: Ani Thakar, Alex Szalay, Ken Church, Andreas Terzis
Format: Article
Language:English
Published: Hindawi Limited 2011-01-01
Series:Scientific Programming
Online Access:http://dx.doi.org/10.3233/SPR-2011-0325
Description
Summary:We report on attempts to put an astronomical database – the Sloan Digital Sky Survey science archive – in the cloud. We find that it is very frustrating to impossible at this time to migrate a complex SQL Server database into current cloud service offerings such as Amazon (EC2) and Microsoft (SQL Azure). Certainly it is impossible to migrate a large database in excess of a TB, but even with (much) smaller databases, the limitations of cloud services make it very difficult to migrate the data to the cloud without making changes to the schema and settings that would degrade performance and/or make the data unusable. Preliminary performance comparisons show a large performance discrepancy with the Amazon cloud version of the SDSS database. These difficulties suggest that much work and coordination needs to occur between cloud service providers and their potential clients before science databases – not just large ones but even smaller databases that make extensive use of advanced database features for performance and usability – can successfully and effectively be deployed in the cloud. We describe a powerful new computational instrument that we are developing in the interim – the Data-Scope – that will enable fast and efficient analysis of the largest (petabyte scale) scientific datasets.
ISSN:1058-9244
1875-919X