Handling Data Flows of Streaming Internet of Things Data

Streaming data in various formats is generated in a very fast way and these data needs to be processed and analyzed before it becomes useless. The technology currently existing provides the tools to process these data and gain more meaningful information out of it. This thesis has two parts: theoret...

Full description

Bibliographic Details
Main Author: Serbessa, Yonatan Kebede
Format: Others
Language:English
Published: Uppsala universitet, Institutionen för informationsteknologi 2016
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-302102
Description
Summary:Streaming data in various formats is generated in a very fast way and these data needs to be processed and analyzed before it becomes useless. The technology currently existing provides the tools to process these data and gain more meaningful information out of it. This thesis has two parts: theoretical and practical. The theoretical part investigates what tools are there that are suitable for stream data flow processing and analysis. In doing so, it starts with studying one of the main streaming data source that produce large volumes of data: Internet of Things. In this, the technologies behind it, common use cases, challenges, and solutions are studied. Then it is followed by overview of selected tools namely Apache NiFi, Apache Spark Streaming and Apache Storm studying their key features, main components, and architecture. After the tools are studied, 5 parameters are selected to review how each tool handles these parameters. This can be useful for considering choosing certain tool given the parameters and the use case at hand. The second part of the thesis involves Twitter data analysis which is done using Apache NiFi, one of the tools studied. The purpose is to show how NiFi can be used for processing data starting from ingestion to finally sending it to storage systems. It is also to show how it communicates with external storage, search, and indexing systems.