Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques

Hepatitis C being as a prevalent disease in the world especially in countries like Egypt. It is estimated that 3-4 million new cases every year, indicating as a public health problem and should be addressed with identification and treatment policies. In the initial stage, it is asymptomatic however...

Full description

Bibliographic Details
Main Authors: Satish CR Nandipati, Chew XinYing, Khaw Khai Wah
Format: Article
Language:English
Published: ARQII PUBLICATION 2020-03-01
Series:Applications of Modelling and Simulation
Subjects:
Online Access:http://arqiipubl.com/ojs/index.php/AMS_Journal/article/view/122/82
id doaj-351cecc5941146bf8007af6a4de403c2
record_format Article
spelling doaj-351cecc5941146bf8007af6a4de403c22020-11-25T02:31:43ZengARQII PUBLICATIONApplications of Modelling and Simulation2600-80842020-03-01489100Hepatitis C Virus (HCV) Prediction by Machine Learning TechniquesSatish CR Nandipati0Chew XinYing1Khaw Khai Wah2School of Computer Sciences, Universiti Sains Malaysia, Pulau Pinang, MalaysiaSchool of Computer Sciences, Universiti Sains Malaysia, Pulau Pinang, MalaysiaSchool of Management, Universiti Sains Malaysia, Pulau Pinang, MalaysiaHepatitis C being as a prevalent disease in the world especially in countries like Egypt. It is estimated that 3-4 million new cases every year, indicating as a public health problem and should be addressed with identification and treatment policies. In the initial stage, it is asymptomatic however when infection progress it leads to chronic conditions such as liver cirrhosis and hepatocellular carcinoma. Some of the various non-invasive serum biochemical markers are used to identify this disease. This study aims to know the performance comparisons between multi and binary class labels of the same dataset, not limited to tool comparison, and to know which selected features play a key role in the prediction of Hepatitis C Virus (HCV) by using Egyptian patient’s dataset. The highest accuracy is shown by KNN (51.06%, R) and random forest (54.56%, Python) in multi and binary class label respectively. The overall evaluation metrics comparison shows R as a better tool for this case. On the other hand, the performance score of the binary class shows better that the multiclass label. The multi-feature selection methods did not show any similar arrangement/topology in the ranking order of selected features. Finally, the 12 selected features by principal component analysis show similar performances to complete dataset and also the 21 selected features, thus showing these features may play a role in the prediction of the HCV dataset.http://arqiipubl.com/ojs/index.php/AMS_Journal/article/view/122/82classificationfeature selectionhepatitis c virusmachine learningprediction multi and binary class labelspython and r tools
collection DOAJ
language English
format Article
sources DOAJ
author Satish CR Nandipati
Chew XinYing
Khaw Khai Wah
spellingShingle Satish CR Nandipati
Chew XinYing
Khaw Khai Wah
Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques
Applications of Modelling and Simulation
classification
feature selection
hepatitis c virus
machine learning
prediction multi and binary class labels
python and r tools
author_facet Satish CR Nandipati
Chew XinYing
Khaw Khai Wah
author_sort Satish CR Nandipati
title Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques
title_short Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques
title_full Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques
title_fullStr Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques
title_full_unstemmed Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques
title_sort hepatitis c virus (hcv) prediction by machine learning techniques
publisher ARQII PUBLICATION
series Applications of Modelling and Simulation
issn 2600-8084
publishDate 2020-03-01
description Hepatitis C being as a prevalent disease in the world especially in countries like Egypt. It is estimated that 3-4 million new cases every year, indicating as a public health problem and should be addressed with identification and treatment policies. In the initial stage, it is asymptomatic however when infection progress it leads to chronic conditions such as liver cirrhosis and hepatocellular carcinoma. Some of the various non-invasive serum biochemical markers are used to identify this disease. This study aims to know the performance comparisons between multi and binary class labels of the same dataset, not limited to tool comparison, and to know which selected features play a key role in the prediction of Hepatitis C Virus (HCV) by using Egyptian patient’s dataset. The highest accuracy is shown by KNN (51.06%, R) and random forest (54.56%, Python) in multi and binary class label respectively. The overall evaluation metrics comparison shows R as a better tool for this case. On the other hand, the performance score of the binary class shows better that the multiclass label. The multi-feature selection methods did not show any similar arrangement/topology in the ranking order of selected features. Finally, the 12 selected features by principal component analysis show similar performances to complete dataset and also the 21 selected features, thus showing these features may play a role in the prediction of the HCV dataset.
topic classification
feature selection
hepatitis c virus
machine learning
prediction multi and binary class labels
python and r tools
url http://arqiipubl.com/ojs/index.php/AMS_Journal/article/view/122/82
work_keys_str_mv AT satishcrnandipati hepatitiscvirushcvpredictionbymachinelearningtechniques
AT chewxinying hepatitiscvirushcvpredictionbymachinelearningtechniques
AT khawkhaiwah hepatitiscvirushcvpredictionbymachinelearningtechniques
_version_ 1724822480048095232