Software Requirements Classification Using Machine Learning Algorithms

The correct classification of requirements has become an essential task within software engineering. This study shows a comparison among the text feature extraction techniques, and machine learning algorithms to the problem of requirements engineer classification to answer the two major questions “W...

Full description

Bibliographic Details
Main Authors:	Edna Dias Canedo, Bruno Cordeiro Mendes
Format:	Article
Language:	English
Published:	MDPI AG 2020-09-01
Series:	Entropy
Subjects:	functional requirements non-functional requirements text normalization feature extraction machine learning support vector machines
Online Access:	https://www.mdpi.com/1099-4300/22/9/1057

id	doaj-0b531a9f158a45c7bb73c5fabd3b8897
record_format	Article
spelling	doaj-0b531a9f158a45c7bb73c5fabd3b88972020-11-25T03:26:31ZengMDPI AGEntropy1099-43002020-09-01221057105710.3390/e22091057Software Requirements Classification Using Machine Learning AlgorithmsEdna Dias Canedo0Bruno Cordeiro Mendes1Department of Computer Science, University of Brasília (UnB), P.O. Box 4466, Brasília 70910-900, BrazilDepartment of Computer Science, University of Brasília (UnB), P.O. Box 4466, Brasília 70910-900, BrazilThe correct classification of requirements has become an essential task within software engineering. This study shows a comparison among the text feature extraction techniques, and machine learning algorithms to the problem of requirements engineer classification to answer the two major questions “Which works best (Bag of Words (BoW) vs. Term Frequency–Inverse Document Frequency (TF-IDF) vs. Chi Squared (<inline-formula><math display="inline"><semantics><msup><mi>CHI</mi><mn>2</mn></msup></semantics></math></inline-formula>)) for classifying Software Requirements into Functional Requirements (FR) and Non-Functional Requirements (NF), and the sub-classes of Non-Functional Requirements?” and “Which Machine Learning Algorithm provides the best performance for the requirements classification task?”. The data used to perform the research was the PROMISE_exp, a recently made dataset that expands the already known PROMISE repository, a repository that contains labeled software requirements. All the documents from the database were cleaned with a set of normalization steps and the two feature extractions, and feature selection techniques used were BoW, TF-IDF and <inline-formula><math display="inline"><semantics><msup><mi>CHI</mi><mn>2</mn></msup></semantics></math></inline-formula> respectively. The algorithms used for classification were Logist Regression (LR), Support Vector Machine (SVM), Multinomial Naive Bayes (MNB) and k-Nearest Neighbors (kNN). The novelty of our work is the data used to perform the experiment, the details of the steps used to reproduce the classification, and the comparison between BoW, TF-IDF and <inline-formula><math display="inline"><semantics><msup><mi>CHI</mi><mn>2</mn></msup></semantics></math></inline-formula> for this repository not having been covered by other studies. This work will serve as a reference for the software engineering community and will help other researchers to understand the requirement classification process. We noticed that the use of TF-IDF followed by the use of LR had a better classification result to differentiate requirements, with an F-measure of 0.91 in binary classification (tying with SVM in that case), 0.74 in NF classification and 0.78 in general classification. As future work we intend to compare more algorithms and new forms to improve the precision of our models.https://www.mdpi.com/1099-4300/22/9/1057functional requirementsnon-functional requirementstext normalizationfeature extractionmachine learningsupport vector machines
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Edna Dias Canedo Bruno Cordeiro Mendes
spellingShingle	Edna Dias Canedo Bruno Cordeiro Mendes Software Requirements Classification Using Machine Learning Algorithms Entropy functional requirements non-functional requirements text normalization feature extraction machine learning support vector machines
author_facet	Edna Dias Canedo Bruno Cordeiro Mendes
author_sort	Edna Dias Canedo
title	Software Requirements Classification Using Machine Learning Algorithms
title_short	Software Requirements Classification Using Machine Learning Algorithms
title_full	Software Requirements Classification Using Machine Learning Algorithms
title_fullStr	Software Requirements Classification Using Machine Learning Algorithms
title_full_unstemmed	Software Requirements Classification Using Machine Learning Algorithms
title_sort	software requirements classification using machine learning algorithms
publisher	MDPI AG
series	Entropy
issn	1099-4300
publishDate	2020-09-01
description	The correct classification of requirements has become an essential task within software engineering. This study shows a comparison among the text feature extraction techniques, and machine learning algorithms to the problem of requirements engineer classification to answer the two major questions “Which works best (Bag of Words (BoW) vs. Term Frequency–Inverse Document Frequency (TF-IDF) vs. Chi Squared (<inline-formula><math display="inline"><semantics><msup><mi>CHI</mi><mn>2</mn></msup></semantics></math></inline-formula>)) for classifying Software Requirements into Functional Requirements (FR) and Non-Functional Requirements (NF), and the sub-classes of Non-Functional Requirements?” and “Which Machine Learning Algorithm provides the best performance for the requirements classification task?”. The data used to perform the research was the PROMISE_exp, a recently made dataset that expands the already known PROMISE repository, a repository that contains labeled software requirements. All the documents from the database were cleaned with a set of normalization steps and the two feature extractions, and feature selection techniques used were BoW, TF-IDF and <inline-formula><math display="inline"><semantics><msup><mi>CHI</mi><mn>2</mn></msup></semantics></math></inline-formula> respectively. The algorithms used for classification were Logist Regression (LR), Support Vector Machine (SVM), Multinomial Naive Bayes (MNB) and k-Nearest Neighbors (kNN). The novelty of our work is the data used to perform the experiment, the details of the steps used to reproduce the classification, and the comparison between BoW, TF-IDF and <inline-formula><math display="inline"><semantics><msup><mi>CHI</mi><mn>2</mn></msup></semantics></math></inline-formula> for this repository not having been covered by other studies. This work will serve as a reference for the software engineering community and will help other researchers to understand the requirement classification process. We noticed that the use of TF-IDF followed by the use of LR had a better classification result to differentiate requirements, with an F-measure of 0.91 in binary classification (tying with SVM in that case), 0.74 in NF classification and 0.78 in general classification. As future work we intend to compare more algorithms and new forms to improve the precision of our models.
topic	functional requirements non-functional requirements text normalization feature extraction machine learning support vector machines
url	https://www.mdpi.com/1099-4300/22/9/1057
work_keys_str_mv	AT ednadiascanedo softwarerequirementsclassificationusingmachinelearningalgorithms AT brunocordeiromendes softwarerequirementsclassificationusingmachinelearningalgorithms
_version_	1724592282862092288

Software Requirements Classification Using Machine Learning Algorithms

Similar Items