Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series.

Most of the research studies developed applying microarray technology to the characterization of different pathological states of any disease may fail in reaching statistically significant results. This is largely due to the small repertoire of analysed samples, and to the limitation in the number o...

Full description

Bibliographic Details
Main Authors: Juan Manuel Gálvez, Daniel Castillo, Luis Javier Herrera, Belén San Román, Olga Valenzuela, Francisco Manuel Ortuño, Ignacio Rojas
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC5947894?pdf=render
id doaj-eb144e0906f646b3920cccf66985786f
record_format Article
spelling doaj-eb144e0906f646b3920cccf66985786f2020-11-25T02:08:05ZengPublic Library of Science (PLoS)PLoS ONE1932-62032018-01-01135e019683610.1371/journal.pone.0196836Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series.Juan Manuel GálvezDaniel CastilloLuis Javier HerreraBelén San RománOlga ValenzuelaFrancisco Manuel OrtuñoIgnacio RojasMost of the research studies developed applying microarray technology to the characterization of different pathological states of any disease may fail in reaching statistically significant results. This is largely due to the small repertoire of analysed samples, and to the limitation in the number of states or pathologies usually addressed. Moreover, the influence of potential deviations on the gene expression quantification is usually disregarded. In spite of the continuous changes in omic sciences, reflected for instance in the emergence of new Next-Generation Sequencing-related technologies, the existing availability of a vast amount of gene expression microarray datasets should be properly exploited. Therefore, this work proposes a novel methodological approach involving the integration of several heterogeneous skin cancer series, and a later multiclass classifier design. This approach is thus a way to provide the clinicians with an intelligent diagnosis support tool based on the use of a robust set of selected biomarkers, which simultaneously distinguishes among different cancer-related skin states. To achieve this, a multi-platform combination of microarray datasets from Affymetrix and Illumina manufacturers was carried out. This integration is expected to strengthen the statistical robustness of the study as well as the finding of highly-reliable skin cancer biomarkers. Specifically, the designed operation pipeline has allowed the identification of a small subset of 17 differentially expressed genes (DEGs) from which to distinguish among 7 involved skin states. These genes were obtained from the assessment of a number of potential batch effects on the gene expression data. The biological interpretation of these genes was inspected in the specific literature to understand their underlying information in relation to skin cancer. Finally, in order to assess their possible effectiveness in cancer diagnosis, a cross-validation Support Vector Machines (SVM)-based classification including feature ranking was performed. The accuracy attained exceeded the 92% in overall recognition of the 7 different cancer-related skin states. The proposed integration scheme is expected to allow the co-integration with other state-of-the-art technologies such as RNA-seq.http://europepmc.org/articles/PMC5947894?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Juan Manuel Gálvez
Daniel Castillo
Luis Javier Herrera
Belén San Román
Olga Valenzuela
Francisco Manuel Ortuño
Ignacio Rojas
spellingShingle Juan Manuel Gálvez
Daniel Castillo
Luis Javier Herrera
Belén San Román
Olga Valenzuela
Francisco Manuel Ortuño
Ignacio Rojas
Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series.
PLoS ONE
author_facet Juan Manuel Gálvez
Daniel Castillo
Luis Javier Herrera
Belén San Román
Olga Valenzuela
Francisco Manuel Ortuño
Ignacio Rojas
author_sort Juan Manuel Gálvez
title Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series.
title_short Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series.
title_full Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series.
title_fullStr Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series.
title_full_unstemmed Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series.
title_sort multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2018-01-01
description Most of the research studies developed applying microarray technology to the characterization of different pathological states of any disease may fail in reaching statistically significant results. This is largely due to the small repertoire of analysed samples, and to the limitation in the number of states or pathologies usually addressed. Moreover, the influence of potential deviations on the gene expression quantification is usually disregarded. In spite of the continuous changes in omic sciences, reflected for instance in the emergence of new Next-Generation Sequencing-related technologies, the existing availability of a vast amount of gene expression microarray datasets should be properly exploited. Therefore, this work proposes a novel methodological approach involving the integration of several heterogeneous skin cancer series, and a later multiclass classifier design. This approach is thus a way to provide the clinicians with an intelligent diagnosis support tool based on the use of a robust set of selected biomarkers, which simultaneously distinguishes among different cancer-related skin states. To achieve this, a multi-platform combination of microarray datasets from Affymetrix and Illumina manufacturers was carried out. This integration is expected to strengthen the statistical robustness of the study as well as the finding of highly-reliable skin cancer biomarkers. Specifically, the designed operation pipeline has allowed the identification of a small subset of 17 differentially expressed genes (DEGs) from which to distinguish among 7 involved skin states. These genes were obtained from the assessment of a number of potential batch effects on the gene expression data. The biological interpretation of these genes was inspected in the specific literature to understand their underlying information in relation to skin cancer. Finally, in order to assess their possible effectiveness in cancer diagnosis, a cross-validation Support Vector Machines (SVM)-based classification including feature ranking was performed. The accuracy attained exceeded the 92% in overall recognition of the 7 different cancer-related skin states. The proposed integration scheme is expected to allow the co-integration with other state-of-the-art technologies such as RNA-seq.
url http://europepmc.org/articles/PMC5947894?pdf=render
work_keys_str_mv AT juanmanuelgalvez multiclassclassificationforskincancerprofilingbasedontheintegrationofheterogeneousgeneexpressionseries
AT danielcastillo multiclassclassificationforskincancerprofilingbasedontheintegrationofheterogeneousgeneexpressionseries
AT luisjavierherrera multiclassclassificationforskincancerprofilingbasedontheintegrationofheterogeneousgeneexpressionseries
AT belensanroman multiclassclassificationforskincancerprofilingbasedontheintegrationofheterogeneousgeneexpressionseries
AT olgavalenzuela multiclassclassificationforskincancerprofilingbasedontheintegrationofheterogeneousgeneexpressionseries
AT franciscomanuelortuno multiclassclassificationforskincancerprofilingbasedontheintegrationofheterogeneousgeneexpressionseries
AT ignaciorojas multiclassclassificationforskincancerprofilingbasedontheintegrationofheterogeneousgeneexpressionseries
_version_ 1724927598066139136