Realizing Rough Set Theory with Spark for Large Scale Information Systems

碩士 === 元智大學 === 電機工程學系 === 104 === Apache Spark, an alternative to Hadoop MapReduce, is currently one of the most active open source projects in the big data world. The major characteristic to differ Spark from Hadoop is that it is a cluster computing framework that lets users perform in-memory comp...

Full description

Bibliographic Details
Main Authors: Kuo-Min Huang, 黃國閔
Other Authors: Kan-Lin Hsing
Format: Others
Language:en_US
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/14274642376799824073
Description
Summary:碩士 === 元智大學 === 電機工程學系 === 104 === Apache Spark, an alternative to Hadoop MapReduce, is currently one of the most active open source projects in the big data world. The major characteristic to differ Spark from Hadoop is that it is a cluster computing framework that lets users perform in-memory computations, which catches data in memory during iterations, in a fault tolerant manner. Supporting iterative algorithms out of the box, Spark has been adopted by many organizations to replace MapReduce. One of the main advantages of rough set models is that they require no preliminary or additional information concerning data, such as membership values in fuzzy set models or probability distribution in statistics. Due to their versatility, rough set methods and algorithms have been widely used in various fields, including voice recognition, audio and image processing, finance, process control, pharmacology and medicine, text mining and exploration of the web, and power system security analysis. In this thesis, a parallel and distributed implementation over Apache Spark to compute rough approximations in huge information systems is reported.