Computational Analysis of Gene Expression Regulation from Cross Species Comparison to Single Cell Resolution

Gene expression regulation is dynamic and specific to various factors such as developmental stages, environmental conditions, and stimulation of pathogens. Nowadays, a tremendous amount of transcriptome data sets are available from diverse species. This trend enables us to perform comparative transc...

Full description

Bibliographic Details
Main Author: Lee, Jiyoung
Other Authors: Genetics, Bioinformatics, and Computational Biology
Format: Others
Published: Virginia Tech 2020
Subjects:
Online Access:http://hdl.handle.net/10919/99878
Description
Summary:Gene expression regulation is dynamic and specific to various factors such as developmental stages, environmental conditions, and stimulation of pathogens. Nowadays, a tremendous amount of transcriptome data sets are available from diverse species. This trend enables us to perform comparative transcriptome analysis that identifies conserved or diverged gene expression responses across species using transcriptome data. The goal of this dissertation is to develop and apply approaches of comparative transcriptomics to transfer knowledge from model species to non-model species with the hope that such an approach can contribute to the improvement of crop yield and human health. First, we presented a comprehensive method to identify cross-species modules between two plant species. We adapted the unsupervised network-based module finding method to identify conserved patterns of co-expression and functional conservation between Arabidopsis, a model species, and soybean, a crop species. Second, we compared drought-responsive genes across Arabidopsis, soybean, rice, corn, and Populus in order to explore the genomic characteristics that are conserved under drought stress across species. We identified hundreds of common gene families and conserved regulatory motifs between monocots and dicots. We also presented a BLS-based clustering method which takes into account evolutionary relationships among species to identify conserved co-expression genes. Last, we analyzed single-cell RNA-seq data from monocytes to attempt to understand regulatory mechanism of innate immune system under low-grade inflammation. We identified novel subpopulations of cells treated with lipopolysaccharide (LPS), that show distinct expression patterns from pro-inflammatory genes. The data revealed that a promising therapeutic reagent, sodium 4-phenylbutyrate, masked the effect of LPS. We inferred the existence of specific cellular transitions under different treatments and prioritized important motifs that modulate the transitions using feature selection by a random forest method. There has been a transition in genomics research from bulk RNA-seq to single-cell RNA-seq, and scRNA-seq has become a widely used approach for transcriptome analysis. With the experience we gained by analyzing scRNA-seq data, we plan to conduct comparative single-cell transcriptome analysis across multiple species. === Doctor of Philosophy === All cells in an organism have the same set of genes, but there are different cell types, tissues, organs with different functions as the organism ages or under different conditions. Gene expression regulation is one mechanism that modulates complex, dynamic, and specific changes in tissues or cell types for any living organisms. Understanding gene regulation is of fundamental importance in biology. With the rapid advancement of sequencing technologies, there is a tremendous amount of gene expression data (transcriptome) from individual species in public repositories. However, major studies have been reported from several model species and research on non-model species have relied on comparison results with a few model species. Comparative transcriptome analysis across species will help us to transform knowledge from model species to non-model species and such knowledge transfer can contribute to the improvement of crop yields and human health. The focus of my dissertation is to develop and apply approaches for comparative transcriptome analysis that can help us better understand what makes each species unique or special, and what kinds of common functions across species have been passed down from ancestors (evolutionarily conserved functions). Three research chapters are presented in this dissertation. First, we developed a method to identify groups of genes that are commonly co-expressed in two species. We chose seed development data from soybean with the hope to contribute to crop improvement. Second, we compared gene expression data across five plant species including soybean, rice, and corn to provide new perspectives about crop plants. We chose drought stress to identify conserved functions and regulatory factors across species since drought stress is one of the major stresses that negatively impact agricultural production. We also proposed a method that groups genes with evolutionary relationships from an unlimited number of species. Third, we analyzed single-cell RNA-seq data from mouse monocytes to understand the regulatory mechanism of the innate immune system under low-grade inflammation. We observed how innate immune cells respond to inflammation that could cause no symptoms but persist for a long period of time. Also, we reported an effect of a promising therapeutic reagent (sodium 4-phenylbutyrate) on chronic inflammatory diseases. The third project will be extended to comparative single-cell transcriptome analysis with multiple species.