A Parallel Elastic Net Clustering Algorithm for Nonlinearly Separable Clustering Problem on Spark

碩士 === 國立中山大學 === 資訊工程學系研究所 === 107 === The so-called elastic net clustering algorithm can achieve a better clustering accuracy rate than traditional clustering methods in solving the non-linearly separable clustering problem. However, the computation time of the elastic net clustering algorithm wil...

Full description

Bibliographic Details
Main Authors: Tzu-Yi Feng, 馮子易
Other Authors: Ming-Chao Chiang
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/nq8mf9
Description
Summary:碩士 === 國立中山大學 === 資訊工程學系研究所 === 107 === The so-called elastic net clustering algorithm can achieve a better clustering accuracy rate than traditional clustering methods in solving the non-linearly separable clustering problem. However, the computation time of the elastic net clustering algorithm will grow exponentially when dealing with large datasets. This makes it hard to apply the elastic net clustering algorithm to large datasets. For this reason, this thesis proposes a parallel elastic net clustering algorithm and implements it on the Apache Spark framework to reduce its response time. The results show that the parallel elastic net clustering algorithm retains not only the high accuracy rate of elastic net clustering algorithm on non-linearly separable clustering problems, it is also capable of reducing 80% of the response time compared to the original elastic net clustering algorithm on a dataset with 19020 points.