Summary: | Hepatitis C being as a prevalent disease in the world especially in countries like Egypt. It is estimated that 3-4 million new cases every year, indicating as a public health problem and should be addressed with identification and treatment policies. In the initial stage, it is asymptomatic however when infection progress it leads to chronic conditions such as liver cirrhosis and hepatocellular carcinoma. Some of the various non-invasive serum biochemical markers are used to identify this disease. This study aims to know the performance comparisons between multi and binary class labels of the same dataset, not limited to tool comparison, and to know which selected features play a key role in the prediction of Hepatitis C Virus (HCV) by using Egyptian patient’s dataset. The highest accuracy is shown by KNN (51.06%, R) and random forest (54.56%, Python) in multi and binary class label respectively. The overall evaluation metrics comparison shows R as a better tool for this case. On the other hand, the performance score of the binary class shows better that the multiclass label. The multi-feature selection methods did not show any similar arrangement/topology in the ranking order of selected features. Finally, the 12 selected features by principal component analysis show similar performances to complete dataset and also the 21 selected features, thus showing these features may play a role in the prediction of the HCV dataset.
|