A Novel Four-Way Approach Designed With Ensemble Feature Selection for Code Smell Detection

Purpose: Code smells are residuals of technical debt induced by the developers. They hinder evolution, adaptability and maintenance of the software. Meanwhile, they are very beneficial in indicating the loopholes of problems and bugs in the software. Machine learning has been extensively used to pre...

Full description

Bibliographic Details
Main Authors: Inderpreet Kaur, Arvinder Kaur
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9316747/
id doaj-ccad79a69c1548158fb178e3f970583b
record_format Article
spelling doaj-ccad79a69c1548158fb178e3f970583b2021-03-30T14:48:36ZengIEEEIEEE Access2169-35362021-01-0198695870710.1109/ACCESS.2021.30498239316747A Novel Four-Way Approach Designed With Ensemble Feature Selection for Code Smell DetectionInderpreet Kaur0https://orcid.org/0000-0002-3646-8356Arvinder Kaur1University School of Information & Communication Technology, Guru Gobind Singh Indraprastha University, New Delhi, IndiaUniversity School of Information & Communication Technology, Guru Gobind Singh Indraprastha University, New Delhi, IndiaPurpose: Code smells are residuals of technical debt induced by the developers. They hinder evolution, adaptability and maintenance of the software. Meanwhile, they are very beneficial in indicating the loopholes of problems and bugs in the software. Machine learning has been extensively used to predict Code Smells in research. The current study aims to optimise the prediction using Ensemble Learning and Feature Selection techniques on three open-source Java data sets. Design and Results: The work Compares four varied approaches to detect code smells using four performance measures Accuracy(P1), G-mean1 (P2), G-mean2 (P3), and F-measure (P4). The study found out that values of the performance measures did not degrade it instead of either remained same or increased with feature selection and Ensemble Learning. Random Forest turns out to be the best classifier while Correlation-based Feature selection(BFS) is best amongst Feature Selection techniques. Ensemble Learning aggregators, i.e. ET5C2 (BFS intersection Relief with classifier Random Forest), ET6C2 (BFS union Relief with classifier Random Forest), and ET5C1 (BFS intersection Relief with Bagging) and Majority Voting give best results from all the aggregation combinations studied. Conclusion: Though the results are good, but using Ensemble learning techniques needs a lot of validation for a variety of data sets before it can be standardised. The Ensemble Learning techniques also pose a challenge concerning diversity and reliability and hence needs exhaustive studies.https://ieeexplore.ieee.org/document/9316747/Aggregatorcode smellensemblefeature selectionmachine learningopen-source projects and performance measures
collection DOAJ
language English
format Article
sources DOAJ
author Inderpreet Kaur
Arvinder Kaur
spellingShingle Inderpreet Kaur
Arvinder Kaur
A Novel Four-Way Approach Designed With Ensemble Feature Selection for Code Smell Detection
IEEE Access
Aggregator
code smell
ensemble
feature selection
machine learning
open-source projects and performance measures
author_facet Inderpreet Kaur
Arvinder Kaur
author_sort Inderpreet Kaur
title A Novel Four-Way Approach Designed With Ensemble Feature Selection for Code Smell Detection
title_short A Novel Four-Way Approach Designed With Ensemble Feature Selection for Code Smell Detection
title_full A Novel Four-Way Approach Designed With Ensemble Feature Selection for Code Smell Detection
title_fullStr A Novel Four-Way Approach Designed With Ensemble Feature Selection for Code Smell Detection
title_full_unstemmed A Novel Four-Way Approach Designed With Ensemble Feature Selection for Code Smell Detection
title_sort novel four-way approach designed with ensemble feature selection for code smell detection
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Purpose: Code smells are residuals of technical debt induced by the developers. They hinder evolution, adaptability and maintenance of the software. Meanwhile, they are very beneficial in indicating the loopholes of problems and bugs in the software. Machine learning has been extensively used to predict Code Smells in research. The current study aims to optimise the prediction using Ensemble Learning and Feature Selection techniques on three open-source Java data sets. Design and Results: The work Compares four varied approaches to detect code smells using four performance measures Accuracy(P1), G-mean1 (P2), G-mean2 (P3), and F-measure (P4). The study found out that values of the performance measures did not degrade it instead of either remained same or increased with feature selection and Ensemble Learning. Random Forest turns out to be the best classifier while Correlation-based Feature selection(BFS) is best amongst Feature Selection techniques. Ensemble Learning aggregators, i.e. ET5C2 (BFS intersection Relief with classifier Random Forest), ET6C2 (BFS union Relief with classifier Random Forest), and ET5C1 (BFS intersection Relief with Bagging) and Majority Voting give best results from all the aggregation combinations studied. Conclusion: Though the results are good, but using Ensemble learning techniques needs a lot of validation for a variety of data sets before it can be standardised. The Ensemble Learning techniques also pose a challenge concerning diversity and reliability and hence needs exhaustive studies.
topic Aggregator
code smell
ensemble
feature selection
machine learning
open-source projects and performance measures
url https://ieeexplore.ieee.org/document/9316747/
work_keys_str_mv AT inderpreetkaur anovelfourwayapproachdesignedwithensemblefeatureselectionforcodesmelldetection
AT arvinderkaur anovelfourwayapproachdesignedwithensemblefeatureselectionforcodesmelldetection
AT inderpreetkaur novelfourwayapproachdesignedwithensemblefeatureselectionforcodesmelldetection
AT arvinderkaur novelfourwayapproachdesignedwithensemblefeatureselectionforcodesmelldetection
_version_ 1724180488014266368