Summary: | The tumor suppressor protein p53, known as the ‘guardian of the genome’, transcriptionally regulates the expression of numerous genes, both coding and non-coding, in response to diverse forms of cellular stress. While numerous reports have been published characterizing the protein coding genes that are transcriptionally regulated by p53, the non-coding targets of p53 are less well-studied. In this thesis, high throughput transcriptome sequencing of cell lines was performed following treatment with different drugs in order to induce p53. Utilizing a combination of de novo transcriptome discovery and mapping to a comprehensive annotation of transcripts named the MiTranscriptome, an extensive catalog of long non-coding RNAs (lncRNAs) was identified. This set of lncRNAs, called p53LTCC (p53 LncRNA Transcriptome from Cultured Cells) are derived from an integrative analysis of RNA-Seq and ChIP-Seq data.
It has been previously shown that while the mutation status of p53 may not be a significant predictor of cancer patient survival, a mutant p53 gene expression signature is associated with poor prognosis in many types of cancer. Moreover, the use of attractor metagenes has revealed that the increased expression of metagenes associated with epithelial-mesenchymal transition (EMT), mitotic instability (chromosomal/genomic instability) and lymphocyte infiltration are associated with poor prognosis. Since the p53 pathway is impaired in one way or the other in most tumors, a classifier based on a p53 metagene derived from our p53LTCC was developed that could differentiate between tumor and normal samples based on gene expression. Using machine learning approaches, diagnostic classifiers that could distinguish tumor and normal samples with a high degree of accuracy were developed. Also, while expression of individual long non-coding RNAs had low correlation with patient survival in different cancers, a lncRNA signature that was derived from the catalog of p53 targets had significant prognostic utility for cancer patient survival.
Since p53 plays a central role in cancer etiology and it is mutated in over 50% of all cancers, we hypothesized that the lncRNA targets of p53 may have vital functions in effectuating the p53 pathway. Indeed, functional studies of two of the lncRNA targets of p53 showed that they play a role in p53-mediated regulation of cell cycle progression in response to DNA damage and are associated with the regulation of reactive oxygen species (ROS) levels in response to oxidative stress. Although the focus of the experimental studies was to elucidate the role of lncRNAs in the p53 pathway, careful analysis of the transcriptome sequencing results revealed insights into the role of different p53 targets (both coding and non-coding) in different contexts to enable a versatile response to diverse stresses. Not only were we able to identify novel targets of p53, the data showed that there are many p53 targets that are unique to each type of stress. There is also a core transcriptional lncRNA program that is activated by p53 regardless of the context.
Finally, during the course of my computational studies, I made numerous observations from bioinformatics analysis of high throughput datasets from different sources that has allowed me to validate many of the experimental results derived by my colleagues (in cell-culture based assays) using cancer patient derived datasets. In order to streamline the workflow of such analysis, I have developed a tool for rapid exploratory data visualization of high throughput datasets for cancer genomics (REDVis) that enables users with minimal programming skills to quickly visualize gene expression, mutation, survival or other clinical, demographic or molecular characterization data for the analysis.
|