Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification

Due to their large sizes and/or dimensions, the classification of Big Data is a challenging task using traditional machine learning, particularly if it is carried out using the well-known K-nearest neighbors classifier (KNN) classifier, which is a slow and lazy classifier by its nature. In this pape...

Full description

Bibliographic Details
Main Author: Ahmad B. A. Hassanat
Format: Article
Language:English
Published: MDPI AG 2018-10-01
Series:Computers
Subjects:
Online Access:http://www.mdpi.com/2073-431X/7/4/54
id doaj-537aee4b7fda464c89bcb55dc4132276
record_format Article
spelling doaj-537aee4b7fda464c89bcb55dc41322762020-11-25T00:50:08ZengMDPI AGComputers2073-431X2018-10-01745410.3390/computers7040054computers7040054Norm-Based Binary Search Trees for Speeding Up KNN Big Data ClassificationAhmad B. A. Hassanat0Information Technology College, Mutah University; Karak 61710, JordanDue to their large sizes and/or dimensions, the classification of Big Data is a challenging task using traditional machine learning, particularly if it is carried out using the well-known K-nearest neighbors classifier (KNN) classifier, which is a slow and lazy classifier by its nature. In this paper, we propose a new approach to Big Data classification using the KNN classifier, which is based on inserting the training examples into a binary search tree to be used later for speeding up the searching process for test examples. For this purpose, we used two methods to sort the training examples. The first calculates the minimum/maximum scaled norm and rounds it to 0 or 1 for each example. Examples with 0-norms are sorted in the left-child of a node, and those with 1-norms are sorted in the right child of the same node; this process continues recursively until we obtain one example or a small number of examples with the same norm in a leaf node. The second proposed method inserts each example into the binary search tree based on its similarity to the examples of the minimum and maximum Euclidean norms. The experimental results of classifying several machine learning big datasets show that both methods are much faster than most of the state-of-the-art methods compared, with competing accuracy rates obtained by the second method, which shows great potential for further enhancements of both methods to be used in practice.http://www.mdpi.com/2073-431X/7/4/54Big Data classificationmachine learning datasetsbinary search treenorms
collection DOAJ
language English
format Article
sources DOAJ
author Ahmad B. A. Hassanat
spellingShingle Ahmad B. A. Hassanat
Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification
Computers
Big Data classification
machine learning datasets
binary search tree
norms
author_facet Ahmad B. A. Hassanat
author_sort Ahmad B. A. Hassanat
title Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification
title_short Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification
title_full Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification
title_fullStr Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification
title_full_unstemmed Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification
title_sort norm-based binary search trees for speeding up knn big data classification
publisher MDPI AG
series Computers
issn 2073-431X
publishDate 2018-10-01
description Due to their large sizes and/or dimensions, the classification of Big Data is a challenging task using traditional machine learning, particularly if it is carried out using the well-known K-nearest neighbors classifier (KNN) classifier, which is a slow and lazy classifier by its nature. In this paper, we propose a new approach to Big Data classification using the KNN classifier, which is based on inserting the training examples into a binary search tree to be used later for speeding up the searching process for test examples. For this purpose, we used two methods to sort the training examples. The first calculates the minimum/maximum scaled norm and rounds it to 0 or 1 for each example. Examples with 0-norms are sorted in the left-child of a node, and those with 1-norms are sorted in the right child of the same node; this process continues recursively until we obtain one example or a small number of examples with the same norm in a leaf node. The second proposed method inserts each example into the binary search tree based on its similarity to the examples of the minimum and maximum Euclidean norms. The experimental results of classifying several machine learning big datasets show that both methods are much faster than most of the state-of-the-art methods compared, with competing accuracy rates obtained by the second method, which shows great potential for further enhancements of both methods to be used in practice.
topic Big Data classification
machine learning datasets
binary search tree
norms
url http://www.mdpi.com/2073-431X/7/4/54
work_keys_str_mv AT ahmadbahassanat normbasedbinarysearchtreesforspeedingupknnbigdataclassification
_version_ 1725249190423953408