Summary: | 碩士 === 元智大學 === 電機工程學系 === 104 === Apache Spark, an alternative to Hadoop MapReduce,
is currently one of the most active open source projects in the big data world.
The major characteristic to differ Spark from Hadoop is that it is a cluster computing framework
that lets users perform in-memory computations, which catches data in memory during iterations, in a fault tolerant manner.
Supporting iterative algorithms out of the box, Spark has been adopted by many organizations to replace MapReduce.
One of the main advantages of rough set models is that they require no preliminary or additional information concerning data,
such as membership values in fuzzy set models or probability distribution in statistics.
Due to their versatility, rough set methods and algorithms have been widely used in various fields, including
voice recognition, audio and image processing, finance, process control, pharmacology and medicine,
text mining and exploration of the web, and power system security analysis.
In this thesis, a parallel and distributed implementation over Apache Spark to compute rough approximations in huge information systems is reported.
|