Efficient Data Stream Sampling on Apache Flink

Efficient Data Stream Sampling on Apache Flink

Sampling is considered to be a core component of data analysis making it possibleto provide a synopsis of possibly large amounts of data by maintainingonly subsets or multisubsets of it. In the context of data streaming, an emergingprocessing paradigm where data is assumed to be unbounded, samplingo...

Full description

Bibliographic Details
Main Author:	Vlachou-Konchylaki, Martha
Format:	Others
Language:	English
Published:	KTH, Skolan för datavetenskap och kommunikation (CSC) 2016
Subjects:	Sampling Streaming Apache Flink Distributed Systems Computer Sciences Datavetenskap (datalogi)
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-183397

Similar Items

FlinkCheck: Property-Based Testing for Apache Flink
by: Cristina Valentina Espinosa, et al.
Published: (2019-01-01)

Towards autoscaling of Apache Flink jobs
by: Varga Balázs, et al.
Published: (2021-06-01)

SPARQL2Flink: Evaluation of SPARQL Queries on Apache Flink
by: Oscar Ceballos, et al.
Published: (2021-07-01)

FlinkNDB : Guaranteed Data Streaming Using External State
by: Asif, Muhammad Haseeb
Published: (2021)

Influencing Factors in the Scalability of Distributed Stream Processing Jobs
by: Giselle Van Dongen, et al.
Published: (2021-01-01)

A Performance Analysis of Fault Recovery in Stream Processing Frameworks
by: Giselle van Dongen, et al.
Published: (2021-01-01)

External Streaming State Abstractions and Benchmarking
by: Sree Kumar, Sruthi
Published: (2021)

DPASF: a flink library for streaming data preprocessing
by: Alejandro Alcalde-Barros, et al.
Published: (2019-06-01)

Streaming Predictive Analytics on Apache Flink
by: Beligianni, Foteini
Published: (2015)

Improving the performance of stream processing pipeline for vehicle data
by: Gu, Wenyu
Published: (2020)

Comparison of Popular Data Processing Systems
by: Nasr, Kamil
Published: (2021)

Matrix Multiplications on Apache Spark through GPUs
by: Safari, Arash
Published: (2017)

Webbserverprogram: Öppen källkods-alternativ till Apache
by: Svantesson, Carlhåkan
Published: (2012)

Spatiotemporal Aspects of Big Data
by: Karim Saadia, et al.
Published: (2018-12-01)

Implementation and Evaluation of a DataPipeline for Industrial IoT Using ApacheNiFi
by: Vilhelmsson, Lina, et al.
Published: (2020)

Performance assessment of Apache Spark applications
by: AL Jorani, Salam
Published: (2019)

StreamAligner: a streaming based sequence aligner on Apache Spark
by: Sanjay Rathee, et al.
Published: (2018-02-01)

Building a high throughput microscope simulator using the Apache Kafka streaming framework
by: Lugnegård, Lovisa
Published: (2018)

Evaluation of Video-on-Demand Streaming Servers
by: Westin, Georg
Published: (2003)

New authentication mechanism using certificates for big data analytic tools
by: Velthuis, Paul
Published: (2017)

Performance comparison between Apache and NGINX under slow rate DoS attacks
by: Al-Saydali, Josef, et al.
Published: (2021)

A Regularization-Based Big Data Framework for Winter Precipitation Forecasting on Streaming Data
by: Andreas Kanavos, et al.
Published: (2021-08-01)

Efficient Streaming Mass Spatio-Temporal Vehicle Data Access in Urban Sensor Networks Based on Apache Storm
by: Lianjie Zhou, et al.
Published: (2017-04-01)

Survey of streaming processing field
by: R. S. Samarev
Published: (2018-10-01)

Using clickthrough data to optimize search result ranking : An evaluation of clickthrough data in terms of relevancy and efficiency
by: Paulsson, Anton
Published: (2017)

JOB SCHEDULING FOR STREAMING APPLICATIONS IN HETEROGENEOUS DISTRIBUTED PROCESSING SYSTEMS
by: Al-Sinayyid, Ali
Published: (2020)

BigDataCube: Distributed Multidimensional Data Cube Over Apache Spark : An OLAP framework that brings Multidimensional Data Analysis to modern Distributed Storage Systems
by: Weherage, Pradeep Peiris
Published: (2017)

Anomaly Detection in Wait Reports and its Relation with Apache Cassandra Statistics
by: Madhu, Abheyraj Singh, et al.
Published: (2021)

Extraction and Energy Efficient Processing of Streaming Data
by: García-Martín, Eva
Published: (2017)

Consequences of converting a data warehouse based on a STAR-schema to a column-oriented-NoSQL-database
by: Bodegård Gustafsson, Rebecca
Published: (2018)

A COMPARISON OF DATA INGESTION PLATFORMS IN REAL-TIME STREAM PROCESSING PIPELINES
by: Tallberg, Sebastian
Published: (2020)

Random Stream Cipher
by: Aghaee, Saeed
Published: (2007)

Bland skotthål och minnesmonument
by: Hjortman, Emma
Published: (2016)

Prestandaoptimering av hybrida mobilapplikationer : En kvalitativ studie
by: Vrethem, Anders
Published: (2019)

Webbserveranalys : En jämförelse av webbservrars svarstider
by: Gustavsson, Marcus, et al.
Published: (2011)

Evaluating associative classification algorithms for Big Data
by: Francisco Padillo, et al.
Published: (2019-01-01)

BigData Analysis in Healthcare: Apache Hadoop , Apache spark and Apache Flink
by: Elham Nazari, et al.
Published: (2019-07-01)

Implementering av testplattform för end-to-end streaming telemetry i nätverk
by: Erlandsson, Niklas
Published: (2020)

Geo-distributed multi-layer stream aggregation
by: Cannalire, Pietro
Published: (2018)

Spatio-temporal outlier detection in streaming trajectory data
by: SZEKÉR, MÁTÉ
Published: (2014)

Cannot write session to /tmp/vufind_sessions/sess_vsb7dvp78ff0jm0cpiii1ihc2q