Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification
Due to their large sizes and/or dimensions, the classification of Big Data is a challenging task using traditional machine learning, particularly if it is carried out using the well-known K-nearest neighbors classifier (KNN) classifier, which is a slow and lazy classifier by its nature. In this pape...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2018-10-01
|
Series: | Computers |
Subjects: | |
Online Access: | http://www.mdpi.com/2073-431X/7/4/54 |
id |
doaj-537aee4b7fda464c89bcb55dc4132276 |
---|---|
record_format |
Article |
spelling |
doaj-537aee4b7fda464c89bcb55dc41322762020-11-25T00:50:08ZengMDPI AGComputers2073-431X2018-10-01745410.3390/computers7040054computers7040054Norm-Based Binary Search Trees for Speeding Up KNN Big Data ClassificationAhmad B. A. Hassanat0Information Technology College, Mutah University; Karak 61710, JordanDue to their large sizes and/or dimensions, the classification of Big Data is a challenging task using traditional machine learning, particularly if it is carried out using the well-known K-nearest neighbors classifier (KNN) classifier, which is a slow and lazy classifier by its nature. In this paper, we propose a new approach to Big Data classification using the KNN classifier, which is based on inserting the training examples into a binary search tree to be used later for speeding up the searching process for test examples. For this purpose, we used two methods to sort the training examples. The first calculates the minimum/maximum scaled norm and rounds it to 0 or 1 for each example. Examples with 0-norms are sorted in the left-child of a node, and those with 1-norms are sorted in the right child of the same node; this process continues recursively until we obtain one example or a small number of examples with the same norm in a leaf node. The second proposed method inserts each example into the binary search tree based on its similarity to the examples of the minimum and maximum Euclidean norms. The experimental results of classifying several machine learning big datasets show that both methods are much faster than most of the state-of-the-art methods compared, with competing accuracy rates obtained by the second method, which shows great potential for further enhancements of both methods to be used in practice.http://www.mdpi.com/2073-431X/7/4/54Big Data classificationmachine learning datasetsbinary search treenorms |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Ahmad B. A. Hassanat |
spellingShingle |
Ahmad B. A. Hassanat Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification Computers Big Data classification machine learning datasets binary search tree norms |
author_facet |
Ahmad B. A. Hassanat |
author_sort |
Ahmad B. A. Hassanat |
title |
Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification |
title_short |
Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification |
title_full |
Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification |
title_fullStr |
Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification |
title_full_unstemmed |
Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification |
title_sort |
norm-based binary search trees for speeding up knn big data classification |
publisher |
MDPI AG |
series |
Computers |
issn |
2073-431X |
publishDate |
2018-10-01 |
description |
Due to their large sizes and/or dimensions, the classification of Big Data is a challenging task using traditional machine learning, particularly if it is carried out using the well-known K-nearest neighbors classifier (KNN) classifier, which is a slow and lazy classifier by its nature. In this paper, we propose a new approach to Big Data classification using the KNN classifier, which is based on inserting the training examples into a binary search tree to be used later for speeding up the searching process for test examples. For this purpose, we used two methods to sort the training examples. The first calculates the minimum/maximum scaled norm and rounds it to 0 or 1 for each example. Examples with 0-norms are sorted in the left-child of a node, and those with 1-norms are sorted in the right child of the same node; this process continues recursively until we obtain one example or a small number of examples with the same norm in a leaf node. The second proposed method inserts each example into the binary search tree based on its similarity to the examples of the minimum and maximum Euclidean norms. The experimental results of classifying several machine learning big datasets show that both methods are much faster than most of the state-of-the-art methods compared, with competing accuracy rates obtained by the second method, which shows great potential for further enhancements of both methods to be used in practice. |
topic |
Big Data classification machine learning datasets binary search tree norms |
url |
http://www.mdpi.com/2073-431X/7/4/54 |
work_keys_str_mv |
AT ahmadbahassanat normbasedbinarysearchtreesforspeedingupknnbigdataclassification |
_version_ |
1725249190423953408 |