Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification

Due to their large sizes and/or dimensions, the classification of Big Data is a challenging task using traditional machine learning, particularly if it is carried out using the well-known K-nearest neighbors classifier (KNN) classifier, which is a slow and lazy classifier by its nature. In this pape...

Full description

Bibliographic Details
Main Author:	Ahmad B. A. Hassanat
Format:	Article
Language:	English
Published:	MDPI AG 2018-10-01
Series:	Computers
Subjects:	Big Data classification machine learning datasets binary search tree norms
Online Access:	http://www.mdpi.com/2073-431X/7/4/54

id	doaj-537aee4b7fda464c89bcb55dc4132276
record_format	Article
spelling	doaj-537aee4b7fda464c89bcb55dc41322762020-11-25T00:50:08ZengMDPI AGComputers2073-431X2018-10-01745410.3390/computers7040054computers7040054Norm-Based Binary Search Trees for Speeding Up KNN Big Data ClassificationAhmad B. A. Hassanat0Information Technology College, Mutah University; Karak 61710, JordanDue to their large sizes and/or dimensions, the classification of Big Data is a challenging task using traditional machine learning, particularly if it is carried out using the well-known K-nearest neighbors classifier (KNN) classifier, which is a slow and lazy classifier by its nature. In this paper, we propose a new approach to Big Data classification using the KNN classifier, which is based on inserting the training examples into a binary search tree to be used later for speeding up the searching process for test examples. For this purpose, we used two methods to sort the training examples. The first calculates the minimum/maximum scaled norm and rounds it to 0 or 1 for each example. Examples with 0-norms are sorted in the left-child of a node, and those with 1-norms are sorted in the right child of the same node; this process continues recursively until we obtain one example or a small number of examples with the same norm in a leaf node. The second proposed method inserts each example into the binary search tree based on its similarity to the examples of the minimum and maximum Euclidean norms. The experimental results of classifying several machine learning big datasets show that both methods are much faster than most of the state-of-the-art methods compared, with competing accuracy rates obtained by the second method, which shows great potential for further enhancements of both methods to be used in practice.http://www.mdpi.com/2073-431X/7/4/54Big Data classificationmachine learning datasetsbinary search treenorms
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Ahmad B. A. Hassanat
spellingShingle	Ahmad B. A. Hassanat Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification Computers Big Data classification machine learning datasets binary search tree norms
author_facet	Ahmad B. A. Hassanat
author_sort	Ahmad B. A. Hassanat
title	Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification
title_short	Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification
title_full	Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification
title_fullStr	Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification
title_full_unstemmed	Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification
title_sort	norm-based binary search trees for speeding up knn big data classification
publisher	MDPI AG
series	Computers
issn	2073-431X
publishDate	2018-10-01
description	Due to their large sizes and/or dimensions, the classification of Big Data is a challenging task using traditional machine learning, particularly if it is carried out using the well-known K-nearest neighbors classifier (KNN) classifier, which is a slow and lazy classifier by its nature. In this paper, we propose a new approach to Big Data classification using the KNN classifier, which is based on inserting the training examples into a binary search tree to be used later for speeding up the searching process for test examples. For this purpose, we used two methods to sort the training examples. The first calculates the minimum/maximum scaled norm and rounds it to 0 or 1 for each example. Examples with 0-norms are sorted in the left-child of a node, and those with 1-norms are sorted in the right child of the same node; this process continues recursively until we obtain one example or a small number of examples with the same norm in a leaf node. The second proposed method inserts each example into the binary search tree based on its similarity to the examples of the minimum and maximum Euclidean norms. The experimental results of classifying several machine learning big datasets show that both methods are much faster than most of the state-of-the-art methods compared, with competing accuracy rates obtained by the second method, which shows great potential for further enhancements of both methods to be used in practice.
topic	Big Data classification machine learning datasets binary search tree norms
url	http://www.mdpi.com/2073-431X/7/4/54
work_keys_str_mv	AT ahmadbahassanat normbasedbinarysearchtreesforspeedingupknnbigdataclassification
_version_	1725249190423953408

Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification

Similar Items