Big Data analytics for the forest industry : A proof-of-conceptbuilt on cloud technologies

Large amounts of data in various forms are generated at a fast pace in today´s society. This is commonly referred to as “Big Data”. Making use of Big Data has been increasingly important for both business and in research. The forest industry is generating big amounts of data during the different pro...

Full description

Bibliographic Details
Main Author:	Sellén, David
Format:	Others
Language:	English
Published:	Mittuniversitetet, Avdelningen för informations- och kommunikationssystem 2016
Subjects:	Big Data analytics Apache Spark StanForD 2010 forest industry harvest production report Computer Engineering Datorteknik
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-28541

id	ndltd-UPSALLA1-oai-DiVA.org-miun-28541
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-miun-285412018-01-11T05:11:20ZBig Data analytics for the forest industry : A proof-of-conceptbuilt on cloud technologiesengSellén, DavidMittuniversitetet, Avdelningen för informations- och kommunikationssystem2016Big Data analyticsApache SparkStanForD 2010forest industryharvest production reportComputer EngineeringDatorteknikLarge amounts of data in various forms are generated at a fast pace in today´s society. This is commonly referred to as “Big Data”. Making use of Big Data has been increasingly important for both business and in research. The forest industry is generating big amounts of data during the different processes of forest harvesting. In Sweden, forest infor-mation is sent to SDC, the information hub for the Swedish forest industry. In 2014, SDC received reports on 75.5 million m3fub from harvester and forwarder machines. These machines use a global stand-ard called StanForD 2010 for communication and to create reports about harvested stems. The arrival of scalable cloud technologies that com-bines Big Data with machine learning makes it interesting to develop an application to analyze the large amounts of data produced by the forest industry. In this study, a proof-of-concept has been implemented to be able to analyze harvest production reports from the StanForD 2010 standard. The system consist of a back-end and front-end application and is built using cloud technologies such as Apache Spark and Ha-doop. System tests have proven that the concept is able to successfully handle storage, processing and machine learning on gigabytes of HPR files. It is capable of extracting information from raw HPR data into datasets and support a machine learning pipeline with pre-processing and K-Means clustering. The proof-of-concept has provided a code base for further development of a system that could be used to find valuable knowledge for the forest industry. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-28541Local DT-V16-A2-005application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Big Data analytics Apache Spark StanForD 2010 forest industry harvest production report Computer Engineering Datorteknik
spellingShingle	Big Data analytics Apache Spark StanForD 2010 forest industry harvest production report Computer Engineering Datorteknik Sellén, David Big Data analytics for the forest industry : A proof-of-conceptbuilt on cloud technologies
description	Large amounts of data in various forms are generated at a fast pace in today´s society. This is commonly referred to as “Big Data”. Making use of Big Data has been increasingly important for both business and in research. The forest industry is generating big amounts of data during the different processes of forest harvesting. In Sweden, forest infor-mation is sent to SDC, the information hub for the Swedish forest industry. In 2014, SDC received reports on 75.5 million m3fub from harvester and forwarder machines. These machines use a global stand-ard called StanForD 2010 for communication and to create reports about harvested stems. The arrival of scalable cloud technologies that com-bines Big Data with machine learning makes it interesting to develop an application to analyze the large amounts of data produced by the forest industry. In this study, a proof-of-concept has been implemented to be able to analyze harvest production reports from the StanForD 2010 standard. The system consist of a back-end and front-end application and is built using cloud technologies such as Apache Spark and Ha-doop. System tests have proven that the concept is able to successfully handle storage, processing and machine learning on gigabytes of HPR files. It is capable of extracting information from raw HPR data into datasets and support a machine learning pipeline with pre-processing and K-Means clustering. The proof-of-concept has provided a code base for further development of a system that could be used to find valuable knowledge for the forest industry.
author	Sellén, David
author_facet	Sellén, David
author_sort	Sellén, David
title	Big Data analytics for the forest industry : A proof-of-conceptbuilt on cloud technologies
title_short	Big Data analytics for the forest industry : A proof-of-conceptbuilt on cloud technologies
title_full	Big Data analytics for the forest industry : A proof-of-conceptbuilt on cloud technologies
title_fullStr	Big Data analytics for the forest industry : A proof-of-conceptbuilt on cloud technologies
title_full_unstemmed	Big Data analytics for the forest industry : A proof-of-conceptbuilt on cloud technologies
title_sort	big data analytics for the forest industry : a proof-of-conceptbuilt on cloud technologies
publisher	Mittuniversitetet, Avdelningen för informations- och kommunikationssystem
publishDate	2016
url	http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-28541
work_keys_str_mv	AT sellendavid bigdataanalyticsfortheforestindustryaproofofconceptbuiltoncloudtechnologies
_version_	1718604502928457728

Big Data analytics for the forest industry : A proof-of-conceptbuilt on cloud technologies

Similar Items