Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression.

Rapid accumulation and availability of gene expression datasets in public repositories have enabled large-scale meta-analyses of combined data. The richness of cross-experiment data has provided new biological insights, including identification of new cancer genes. In this study, we compiled a human...

Full description

Bibliographic Details
Main Authors: Aurora Torrente, Margus Lukk, Vincent Xue, Helen Parkinson, Johan Rung, Alvis Brazma
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2016-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC4913919?pdf=render
id doaj-9b5354622ec54c2997e553fb619bf84f
record_format Article
spelling doaj-9b5354622ec54c2997e553fb619bf84f2020-11-25T01:50:25ZengPublic Library of Science (PLoS)PLoS ONE1932-62032016-01-01116e015748410.1371/journal.pone.0157484Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression.Aurora TorrenteMargus LukkVincent XueHelen ParkinsonJohan RungAlvis BrazmaRapid accumulation and availability of gene expression datasets in public repositories have enabled large-scale meta-analyses of combined data. The richness of cross-experiment data has provided new biological insights, including identification of new cancer genes. In this study, we compiled a human gene expression dataset from ∼40,000 publicly available Affymetrix HG-U133Plus2 arrays. After strict quality control and data normalisation the data was quantified in an expression matrix of ∼20,000 genes and ∼28,000 samples. To enable different ways of sample grouping, existing annotations where subjected to systematic ontology assisted categorisation and manual curation. Groups like normal tissues, neoplasmic tissues, cell lines, homoeotic cells and incompletely differentiated cells were created. Unsupervised analysis of the data confirmed global structure of expression consistent with earlier analysis but with more details revealed due to increased resolution. A suitable mixed-effects linear model was used to further investigate gene expression in solid tissue tumours, and to compare these with the respective healthy solid tissues. The analysis identified 1,285 genes with systematic expression change in cancer. The list is significantly enriched with known cancer genes from large, public, peer-reviewed databases, whereas the remaining ones are proposed as new cancer gene candidates. The compiled dataset is publicly available in the ArrayExpress Archive. It contains the most diverse collection of biological samples, making it the largest systematically annotated gene expression dataset of its kind in the public domain.http://europepmc.org/articles/PMC4913919?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Aurora Torrente
Margus Lukk
Vincent Xue
Helen Parkinson
Johan Rung
Alvis Brazma
spellingShingle Aurora Torrente
Margus Lukk
Vincent Xue
Helen Parkinson
Johan Rung
Alvis Brazma
Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression.
PLoS ONE
author_facet Aurora Torrente
Margus Lukk
Vincent Xue
Helen Parkinson
Johan Rung
Alvis Brazma
author_sort Aurora Torrente
title Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression.
title_short Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression.
title_full Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression.
title_fullStr Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression.
title_full_unstemmed Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression.
title_sort identification of cancer related genes using a comprehensive map of human gene expression.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2016-01-01
description Rapid accumulation and availability of gene expression datasets in public repositories have enabled large-scale meta-analyses of combined data. The richness of cross-experiment data has provided new biological insights, including identification of new cancer genes. In this study, we compiled a human gene expression dataset from ∼40,000 publicly available Affymetrix HG-U133Plus2 arrays. After strict quality control and data normalisation the data was quantified in an expression matrix of ∼20,000 genes and ∼28,000 samples. To enable different ways of sample grouping, existing annotations where subjected to systematic ontology assisted categorisation and manual curation. Groups like normal tissues, neoplasmic tissues, cell lines, homoeotic cells and incompletely differentiated cells were created. Unsupervised analysis of the data confirmed global structure of expression consistent with earlier analysis but with more details revealed due to increased resolution. A suitable mixed-effects linear model was used to further investigate gene expression in solid tissue tumours, and to compare these with the respective healthy solid tissues. The analysis identified 1,285 genes with systematic expression change in cancer. The list is significantly enriched with known cancer genes from large, public, peer-reviewed databases, whereas the remaining ones are proposed as new cancer gene candidates. The compiled dataset is publicly available in the ArrayExpress Archive. It contains the most diverse collection of biological samples, making it the largest systematically annotated gene expression dataset of its kind in the public domain.
url http://europepmc.org/articles/PMC4913919?pdf=render
work_keys_str_mv AT auroratorrente identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression
AT marguslukk identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression
AT vincentxue identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression
AT helenparkinson identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression
AT johanrung identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression
AT alvisbrazma identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression
_version_ 1725001960840495104