Partitioning of functional gene expression data using principal points

Abstract Background DNA microarrays offer motivation and hope for the simultaneous study of variations in multiple genes. Gene expression is a temporal process that allows variations in expression levels with a characterized gene function over a period of time. Temporal gene expression curves can be...

Full description

Bibliographic Details
Main Authors: Jaehee Kim, Haseong Kim
Format: Article
Language:English
Published: BMC 2017-10-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-017-1860-0
id doaj-90cf00690cc14d8e9018f90eee2c7007
record_format Article
spelling doaj-90cf00690cc14d8e9018f90eee2c70072020-11-24T21:17:08ZengBMCBMC Bioinformatics1471-21052017-10-0118111710.1186/s12859-017-1860-0Partitioning of functional gene expression data using principal pointsJaehee Kim0Haseong Kim1Department of Statistics, Duksung Women’s UniversityKorea Research Institute of Bioscience and Biotechnology (KRIBB)Abstract Background DNA microarrays offer motivation and hope for the simultaneous study of variations in multiple genes. Gene expression is a temporal process that allows variations in expression levels with a characterized gene function over a period of time. Temporal gene expression curves can be treated as functional data since they are considered as independent realizations of a stochastic process. This process requires appropriate models to identify patterns of gene functions. The partitioning of the functional data can find homogeneous subgroups of entities for the massive genes within the inherent biological networks. Therefor it can be a useful technique for the analysis of time-course gene expression data. We propose a new self-consistent partitioning method of functional coefficients for individual expression profiles based on the orthonormal basis system. Results A principal points based functional partitioning method is proposed for time-course gene expression data. The method explores the relationship between genes using Legendre coefficients as principal points to extract the features of gene functions. Our proposed method provides high connectivity in connectedness after clustering for simulated data and finds a significant subsets of genes with the increased connectivity. Our approach has comparative advantages that fewer coefficients are used from the functional data and self-consistency of principal points for partitioning. As real data applications, we are able to find partitioned genes through the gene expressions found in budding yeast data and Escherichia coli data. Conclusions The proposed method benefitted from the use of principal points, dimension reduction, and choice of orthogonal basis system as well as provides appropriately connected genes in the resulting subsets. We illustrate our method by applying with each set of cell-cycle-regulated time-course yeast genes and E. coli genes. The proposed method is able to identify highly connected genes and to explore the complex dynamics of biological systems in functional genomics.http://link.springer.com/article/10.1186/s12859-017-1860-0Fourier coefficientsLegendre polynomialsEscherichia coli Microarray expression dataK-means clusteringPrincipal pointsSilhouette
collection DOAJ
language English
format Article
sources DOAJ
author Jaehee Kim
Haseong Kim
spellingShingle Jaehee Kim
Haseong Kim
Partitioning of functional gene expression data using principal points
BMC Bioinformatics
Fourier coefficients
Legendre polynomials
Escherichia coli Microarray expression data
K-means clustering
Principal points
Silhouette
author_facet Jaehee Kim
Haseong Kim
author_sort Jaehee Kim
title Partitioning of functional gene expression data using principal points
title_short Partitioning of functional gene expression data using principal points
title_full Partitioning of functional gene expression data using principal points
title_fullStr Partitioning of functional gene expression data using principal points
title_full_unstemmed Partitioning of functional gene expression data using principal points
title_sort partitioning of functional gene expression data using principal points
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2017-10-01
description Abstract Background DNA microarrays offer motivation and hope for the simultaneous study of variations in multiple genes. Gene expression is a temporal process that allows variations in expression levels with a characterized gene function over a period of time. Temporal gene expression curves can be treated as functional data since they are considered as independent realizations of a stochastic process. This process requires appropriate models to identify patterns of gene functions. The partitioning of the functional data can find homogeneous subgroups of entities for the massive genes within the inherent biological networks. Therefor it can be a useful technique for the analysis of time-course gene expression data. We propose a new self-consistent partitioning method of functional coefficients for individual expression profiles based on the orthonormal basis system. Results A principal points based functional partitioning method is proposed for time-course gene expression data. The method explores the relationship between genes using Legendre coefficients as principal points to extract the features of gene functions. Our proposed method provides high connectivity in connectedness after clustering for simulated data and finds a significant subsets of genes with the increased connectivity. Our approach has comparative advantages that fewer coefficients are used from the functional data and self-consistency of principal points for partitioning. As real data applications, we are able to find partitioned genes through the gene expressions found in budding yeast data and Escherichia coli data. Conclusions The proposed method benefitted from the use of principal points, dimension reduction, and choice of orthogonal basis system as well as provides appropriately connected genes in the resulting subsets. We illustrate our method by applying with each set of cell-cycle-regulated time-course yeast genes and E. coli genes. The proposed method is able to identify highly connected genes and to explore the complex dynamics of biological systems in functional genomics.
topic Fourier coefficients
Legendre polynomials
Escherichia coli Microarray expression data
K-means clustering
Principal points
Silhouette
url http://link.springer.com/article/10.1186/s12859-017-1860-0
work_keys_str_mv AT jaeheekim partitioningoffunctionalgeneexpressiondatausingprincipalpoints
AT haseongkim partitioningoffunctionalgeneexpressiondatausingprincipalpoints
_version_ 1726014007267033088