An embedded method for gene identification problems involving unwanted data heterogeneity

Abstract Background Modern applications such as bioinformatics collecting data in various ways can easily result in heterogeneous data. Traditional variable selection methods assume samples are independent and identically distributed, which however is not suitable for these applications. Some existi...

Full description

Bibliographic Details
Main Author: Meng Lu
Format: Article
Language:English
Published: BMC 2019-10-01
Series:Human Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s40246-019-0228-0
id doaj-f604d2ae75d0465691911aba503ce3df
record_format Article
spelling doaj-f604d2ae75d0465691911aba503ce3df2020-11-25T03:36:37ZengBMCHuman Genomics1479-73642019-10-0113S111010.1186/s40246-019-0228-0An embedded method for gene identification problems involving unwanted data heterogeneityMeng Lu0Department of Information Management,Tianjin UniversityAbstract Background Modern applications such as bioinformatics collecting data in various ways can easily result in heterogeneous data. Traditional variable selection methods assume samples are independent and identically distributed, which however is not suitable for these applications. Some existing statistical models capable of taking care of unwanted variation were developed for gene identification involving heterogeneous data, but they lack model predictability and suffer from variable redundancy. Results By accounting for the unwanted heterogeneity effectively, our method have shown its superiority over several state-of-the art methods, which is validated by the experimental results in both unsupervised and supervised gene identification problems. Moreover, we also applied our method to a pan-cancer study where our method can identify the most discriminative genes best distinguishing different cancer types. Conclusions This article provides an alternative gene identification method that can accounting for unwanted data heterogeneity. It is a promising method to provide new insights into the complex cancer biology and clues for understanding tumorigenesis and tumor progression.http://link.springer.com/article/10.1186/s40246-019-0228-0Unwanted heterogeneityGene identificationEmbedded variable selection
collection DOAJ
language English
format Article
sources DOAJ
author Meng Lu
spellingShingle Meng Lu
An embedded method for gene identification problems involving unwanted data heterogeneity
Human Genomics
Unwanted heterogeneity
Gene identification
Embedded variable selection
author_facet Meng Lu
author_sort Meng Lu
title An embedded method for gene identification problems involving unwanted data heterogeneity
title_short An embedded method for gene identification problems involving unwanted data heterogeneity
title_full An embedded method for gene identification problems involving unwanted data heterogeneity
title_fullStr An embedded method for gene identification problems involving unwanted data heterogeneity
title_full_unstemmed An embedded method for gene identification problems involving unwanted data heterogeneity
title_sort embedded method for gene identification problems involving unwanted data heterogeneity
publisher BMC
series Human Genomics
issn 1479-7364
publishDate 2019-10-01
description Abstract Background Modern applications such as bioinformatics collecting data in various ways can easily result in heterogeneous data. Traditional variable selection methods assume samples are independent and identically distributed, which however is not suitable for these applications. Some existing statistical models capable of taking care of unwanted variation were developed for gene identification involving heterogeneous data, but they lack model predictability and suffer from variable redundancy. Results By accounting for the unwanted heterogeneity effectively, our method have shown its superiority over several state-of-the art methods, which is validated by the experimental results in both unsupervised and supervised gene identification problems. Moreover, we also applied our method to a pan-cancer study where our method can identify the most discriminative genes best distinguishing different cancer types. Conclusions This article provides an alternative gene identification method that can accounting for unwanted data heterogeneity. It is a promising method to provide new insights into the complex cancer biology and clues for understanding tumorigenesis and tumor progression.
topic Unwanted heterogeneity
Gene identification
Embedded variable selection
url http://link.springer.com/article/10.1186/s40246-019-0228-0
work_keys_str_mv AT menglu anembeddedmethodforgeneidentificationproblemsinvolvingunwanteddataheterogeneity
AT menglu embeddedmethodforgeneidentificationproblemsinvolvingunwanteddataheterogeneity
_version_ 1724549051046690816