An Event-Driven Serverless ETL Pipeline on AWS
This work presents an event-driven Extract, Transform, and Load (ETL) pipeline serverless architecture and provides an evaluation of its performance over a range of dataflow tasks of varying frequency, velocity, and payload size. We design an experiment while using generated tabular data throughout...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-12-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/11/1/191 |
id |
doaj-3691642f30494958b4274cb187230467 |
---|---|
record_format |
Article |
spelling |
doaj-3691642f30494958b4274cb1872304672020-12-29T00:00:28ZengMDPI AGApplied Sciences2076-34172021-12-011119119110.3390/app11010191An Event-Driven Serverless ETL Pipeline on AWSAntreas Pogiatzis0Georgios Samakovitis1School of Computing and Mathematical Sciences, University of Greenwich, Old Royal Naval College, Park Row, Greenwich, London SE10 9LS, UKSchool of Computing and Mathematical Sciences, University of Greenwich, Old Royal Naval College, Park Row, Greenwich, London SE10 9LS, UKThis work presents an event-driven Extract, Transform, and Load (ETL) pipeline serverless architecture and provides an evaluation of its performance over a range of dataflow tasks of varying frequency, velocity, and payload size. We design an experiment while using generated tabular data throughout varying data volumes, event frequencies, and processing power in order to measure: (i) the consistency of pipeline executions; (ii) reliability on data delivery; (iii) maximum payload size per pipeline; and, (iv) economic scalability (cost of chargeable tasks). We run 92 parameterised experiments on a simple AWS architecture, thus avoiding any AWS-enhanced platform features, in order to allow for unbiased assessment of our model’s performance. Our results indicate that our reference architecture can achieve time-consistent data processing of event payloads of more than 100 MB, with a throughput of 750 KB/s across four event frequencies. It is also observed that, although the utilisation of an SQS queue for data transfer enables easy concurrency control and data slicing, it becomes a bottleneck on large sized event payloads. Finally, we develop and discuss a candidate pricing model for our reference architecture usage.https://www.mdpi.com/2076-3417/11/1/191serverlessFaaSevent-drivendistributedAWSETL |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Antreas Pogiatzis Georgios Samakovitis |
spellingShingle |
Antreas Pogiatzis Georgios Samakovitis An Event-Driven Serverless ETL Pipeline on AWS Applied Sciences serverless FaaS event-driven distributed AWS ETL |
author_facet |
Antreas Pogiatzis Georgios Samakovitis |
author_sort |
Antreas Pogiatzis |
title |
An Event-Driven Serverless ETL Pipeline on AWS |
title_short |
An Event-Driven Serverless ETL Pipeline on AWS |
title_full |
An Event-Driven Serverless ETL Pipeline on AWS |
title_fullStr |
An Event-Driven Serverless ETL Pipeline on AWS |
title_full_unstemmed |
An Event-Driven Serverless ETL Pipeline on AWS |
title_sort |
event-driven serverless etl pipeline on aws |
publisher |
MDPI AG |
series |
Applied Sciences |
issn |
2076-3417 |
publishDate |
2021-12-01 |
description |
This work presents an event-driven Extract, Transform, and Load (ETL) pipeline serverless architecture and provides an evaluation of its performance over a range of dataflow tasks of varying frequency, velocity, and payload size. We design an experiment while using generated tabular data throughout varying data volumes, event frequencies, and processing power in order to measure: (i) the consistency of pipeline executions; (ii) reliability on data delivery; (iii) maximum payload size per pipeline; and, (iv) economic scalability (cost of chargeable tasks). We run 92 parameterised experiments on a simple AWS architecture, thus avoiding any AWS-enhanced platform features, in order to allow for unbiased assessment of our model’s performance. Our results indicate that our reference architecture can achieve time-consistent data processing of event payloads of more than 100 MB, with a throughput of 750 KB/s across four event frequencies. It is also observed that, although the utilisation of an SQS queue for data transfer enables easy concurrency control and data slicing, it becomes a bottleneck on large sized event payloads. Finally, we develop and discuss a candidate pricing model for our reference architecture usage. |
topic |
serverless FaaS event-driven distributed AWS ETL |
url |
https://www.mdpi.com/2076-3417/11/1/191 |
work_keys_str_mv |
AT antreaspogiatzis aneventdrivenserverlessetlpipelineonaws AT georgiossamakovitis aneventdrivenserverlessetlpipelineonaws AT antreaspogiatzis eventdrivenserverlessetlpipelineonaws AT georgiossamakovitis eventdrivenserverlessetlpipelineonaws |
_version_ |
1724368249069502464 |