Breast cancer image classification using pattern-based Hyper Conceptual Sampling method

The increase in biomedical data has given rise to the need for developing data sampling techniques. With the emergence of big data and the rise of popularity of data science, sampling or reduction techniques have been assistive to significantly hasten the data analytics process. Intuitively, without...

Full description

Bibliographic Details
Main Authors: Tooba Salahuddin, Fatima Haouari, Fahad Islam, Rahma Ali, Sara Al-Rasbi, Nada Aboueata, Eman Rezk, Ali Jaoua
Format: Article
Language:English
Published: Elsevier 2018-01-01
Series:Informatics in Medicine Unlocked
Online Access:http://www.sciencedirect.com/science/article/pii/S2352914818301084
Description
Summary:The increase in biomedical data has given rise to the need for developing data sampling techniques. With the emergence of big data and the rise of popularity of data science, sampling or reduction techniques have been assistive to significantly hasten the data analytics process. Intuitively, without sampling techniques, it would be difficult to efficiently extract useful patterns from a large dataset. However, by using sampling techniques, data analysis can effectively be performed on huge datasets, to produce a relatively small portion of data, which extracts the most representative objects from the original dataset. However, to reach effective conclusions and predictions, the samples should preserve the data behavior. In this paper, we propose a unique data sampling technique which exploits the notion of formal concept analysis. Machine learning experiments are performed on the resulting sample to evaluate quality, and the performance of our method is compared with another sampling technique proposed in the literature. The results demonstrate the effectiveness and competitiveness of the proposed approach in terms of sample size and quality, as determined by accuracy and the F1-measure. Keywords: Data sampling, Formal concept analysis, Breast cancer, Machine learning
ISSN:2352-9148