Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques
Hepatitis C being as a prevalent disease in the world especially in countries like Egypt. It is estimated that 3-4 million new cases every year, indicating as a public health problem and should be addressed with identification and treatment policies. In the initial stage, it is asymptomatic however...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
ARQII PUBLICATION
2020-03-01
|
Series: | Applications of Modelling and Simulation |
Subjects: | |
Online Access: | http://arqiipubl.com/ojs/index.php/AMS_Journal/article/view/122/82 |
id |
doaj-351cecc5941146bf8007af6a4de403c2 |
---|---|
record_format |
Article |
spelling |
doaj-351cecc5941146bf8007af6a4de403c22020-11-25T02:31:43ZengARQII PUBLICATIONApplications of Modelling and Simulation2600-80842020-03-01489100Hepatitis C Virus (HCV) Prediction by Machine Learning TechniquesSatish CR Nandipati0Chew XinYing1Khaw Khai Wah2School of Computer Sciences, Universiti Sains Malaysia, Pulau Pinang, MalaysiaSchool of Computer Sciences, Universiti Sains Malaysia, Pulau Pinang, MalaysiaSchool of Management, Universiti Sains Malaysia, Pulau Pinang, MalaysiaHepatitis C being as a prevalent disease in the world especially in countries like Egypt. It is estimated that 3-4 million new cases every year, indicating as a public health problem and should be addressed with identification and treatment policies. In the initial stage, it is asymptomatic however when infection progress it leads to chronic conditions such as liver cirrhosis and hepatocellular carcinoma. Some of the various non-invasive serum biochemical markers are used to identify this disease. This study aims to know the performance comparisons between multi and binary class labels of the same dataset, not limited to tool comparison, and to know which selected features play a key role in the prediction of Hepatitis C Virus (HCV) by using Egyptian patient’s dataset. The highest accuracy is shown by KNN (51.06%, R) and random forest (54.56%, Python) in multi and binary class label respectively. The overall evaluation metrics comparison shows R as a better tool for this case. On the other hand, the performance score of the binary class shows better that the multiclass label. The multi-feature selection methods did not show any similar arrangement/topology in the ranking order of selected features. Finally, the 12 selected features by principal component analysis show similar performances to complete dataset and also the 21 selected features, thus showing these features may play a role in the prediction of the HCV dataset.http://arqiipubl.com/ojs/index.php/AMS_Journal/article/view/122/82classificationfeature selectionhepatitis c virusmachine learningprediction multi and binary class labelspython and r tools |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Satish CR Nandipati Chew XinYing Khaw Khai Wah |
spellingShingle |
Satish CR Nandipati Chew XinYing Khaw Khai Wah Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques Applications of Modelling and Simulation classification feature selection hepatitis c virus machine learning prediction multi and binary class labels python and r tools |
author_facet |
Satish CR Nandipati Chew XinYing Khaw Khai Wah |
author_sort |
Satish CR Nandipati |
title |
Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques |
title_short |
Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques |
title_full |
Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques |
title_fullStr |
Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques |
title_full_unstemmed |
Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques |
title_sort |
hepatitis c virus (hcv) prediction by machine learning techniques |
publisher |
ARQII PUBLICATION |
series |
Applications of Modelling and Simulation |
issn |
2600-8084 |
publishDate |
2020-03-01 |
description |
Hepatitis C being as a prevalent disease in the world especially in countries like Egypt. It is estimated that 3-4 million new cases every year, indicating as a public health problem and should be addressed with identification and treatment policies. In the initial stage, it is asymptomatic however when infection progress it leads to chronic conditions such as liver cirrhosis and hepatocellular carcinoma. Some of the various non-invasive serum biochemical markers are used to identify this disease. This study aims to know the performance comparisons between multi and binary class labels of the same dataset, not limited to tool comparison, and to know which selected features play a key role in the prediction of Hepatitis C Virus (HCV) by using Egyptian patient’s dataset. The highest accuracy is shown by KNN (51.06%, R) and random forest (54.56%, Python) in multi and binary class label respectively. The overall evaluation metrics comparison shows R as a better tool for this case. On the other hand, the performance score of the binary class shows better that the multiclass label. The multi-feature selection methods did not show any similar arrangement/topology in the ranking order of selected features. Finally, the 12 selected features by principal component analysis show similar performances to complete dataset and also the 21 selected features, thus showing these features may play a role in the prediction of the HCV dataset. |
topic |
classification feature selection hepatitis c virus machine learning prediction multi and binary class labels python and r tools |
url |
http://arqiipubl.com/ojs/index.php/AMS_Journal/article/view/122/82 |
work_keys_str_mv |
AT satishcrnandipati hepatitiscvirushcvpredictionbymachinelearningtechniques AT chewxinying hepatitiscvirushcvpredictionbymachinelearningtechniques AT khawkhaiwah hepatitiscvirushcvpredictionbymachinelearningtechniques |
_version_ |
1724822480048095232 |