Identification of 11 candidate structured noncoding RNA motifs in humans by comparative genomics

Abstract Background Only 1.5% of the human genome encodes proteins, while large part of the remaining encodes noncoding RNAs (ncRNA). Many ncRNAs form structures and perform many important functions. Accurately identifying structured ncRNAs in the human genome and discovering their biological functi...

Full description

Bibliographic Details
Main Authors: Lijuan Hou, Jin Xie, Yaoyao Wu, Jiaojiao Wang, Anqi Duan, Yaqi Ao, Xuejiao Liu, Xinmei Yu, Hui Yan, Jonathan Perreault, Sanshu Li
Format: Article
Language:English
Published: BMC 2021-03-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-021-07474-9
id doaj-7cf62bd1d4354d7498b0ba7a917d5c44
record_format Article
collection DOAJ
language English
format Article
sources DOAJ
author Lijuan Hou
Jin Xie
Yaoyao Wu
Jiaojiao Wang
Anqi Duan
Yaqi Ao
Xuejiao Liu
Xinmei Yu
Hui Yan
Jonathan Perreault
Sanshu Li
spellingShingle Lijuan Hou
Jin Xie
Yaoyao Wu
Jiaojiao Wang
Anqi Duan
Yaqi Ao
Xuejiao Liu
Xinmei Yu
Hui Yan
Jonathan Perreault
Sanshu Li
Identification of 11 candidate structured noncoding RNA motifs in humans by comparative genomics
BMC Genomics
Comparative genomics
Structured ncRNAs
Human genomes
Animal genomes
Pipeline
author_facet Lijuan Hou
Jin Xie
Yaoyao Wu
Jiaojiao Wang
Anqi Duan
Yaqi Ao
Xuejiao Liu
Xinmei Yu
Hui Yan
Jonathan Perreault
Sanshu Li
author_sort Lijuan Hou
title Identification of 11 candidate structured noncoding RNA motifs in humans by comparative genomics
title_short Identification of 11 candidate structured noncoding RNA motifs in humans by comparative genomics
title_full Identification of 11 candidate structured noncoding RNA motifs in humans by comparative genomics
title_fullStr Identification of 11 candidate structured noncoding RNA motifs in humans by comparative genomics
title_full_unstemmed Identification of 11 candidate structured noncoding RNA motifs in humans by comparative genomics
title_sort identification of 11 candidate structured noncoding rna motifs in humans by comparative genomics
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2021-03-01
description Abstract Background Only 1.5% of the human genome encodes proteins, while large part of the remaining encodes noncoding RNAs (ncRNA). Many ncRNAs form structures and perform many important functions. Accurately identifying structured ncRNAs in the human genome and discovering their biological functions remain a major challenge. Results Here, we have established a pipeline (CM-line) with the following features for analyzing the large genomes of humans and other animals. First, we selected species with larger genetic distances to facilitate the discovery of covariations and compatible mutations. Second, we used CMfinder, which can generate useful alignments even with low sequence conservation. Third, we removed repetitive sequences and known structured ncRNAs to reduce the workload of CMfinder. Fourth, we used Infernal to find more representatives and refine the structure. We reported 11 classes of structured ncRNA candidates with significant covariations in humans. Functional analysis showed that these ncRNAs may have variable functions. Some may regulate circadian clock genes through poly (A) signals (PAS); some may regulate the elongation factor (EEF1A) and the T-cell receptor signaling pathway by cooperating with RNA binding proteins. Conclusions By searching for important features of RNA structure from large genomes, the CM-line has revealed the existence of a variety of novel structured ncRNAs. Functional analysis suggests that some newly discovered ncRNA motifs may have biological functions. The pipeline we have established for the discovery of structured ncRNAs and the identification of their functions can also be applied to analyze other large genomes.
topic Comparative genomics
Structured ncRNAs
Human genomes
Animal genomes
Pipeline
url https://doi.org/10.1186/s12864-021-07474-9
work_keys_str_mv AT lijuanhou identificationof11candidatestructurednoncodingrnamotifsinhumansbycomparativegenomics
AT jinxie identificationof11candidatestructurednoncodingrnamotifsinhumansbycomparativegenomics
AT yaoyaowu identificationof11candidatestructurednoncodingrnamotifsinhumansbycomparativegenomics
AT jiaojiaowang identificationof11candidatestructurednoncodingrnamotifsinhumansbycomparativegenomics
AT anqiduan identificationof11candidatestructurednoncodingrnamotifsinhumansbycomparativegenomics
AT yaqiao identificationof11candidatestructurednoncodingrnamotifsinhumansbycomparativegenomics
AT xuejiaoliu identificationof11candidatestructurednoncodingrnamotifsinhumansbycomparativegenomics
AT xinmeiyu identificationof11candidatestructurednoncodingrnamotifsinhumansbycomparativegenomics
AT huiyan identificationof11candidatestructurednoncodingrnamotifsinhumansbycomparativegenomics
AT jonathanperreault identificationof11candidatestructurednoncodingrnamotifsinhumansbycomparativegenomics
AT sanshuli identificationof11candidatestructurednoncodingrnamotifsinhumansbycomparativegenomics
_version_ 1724225081980223488
spelling doaj-7cf62bd1d4354d7498b0ba7a917d5c442021-03-11T11:53:56ZengBMCBMC Genomics1471-21642021-03-0122111410.1186/s12864-021-07474-9Identification of 11 candidate structured noncoding RNA motifs in humans by comparative genomicsLijuan Hou0Jin Xie1Yaoyao Wu2Jiaojiao Wang3Anqi Duan4Yaqi Ao5Xuejiao Liu6Xinmei Yu7Hui Yan8Jonathan Perreault9Sanshu Li10Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao UniversityMedical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao UniversityMedical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao UniversityMedical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao UniversityMedical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao UniversityMedical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao UniversityMedical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao UniversityMedical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao UniversityMedical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao UniversityINRS - Institut Armand-FrappierMedical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao UniversityAbstract Background Only 1.5% of the human genome encodes proteins, while large part of the remaining encodes noncoding RNAs (ncRNA). Many ncRNAs form structures and perform many important functions. Accurately identifying structured ncRNAs in the human genome and discovering their biological functions remain a major challenge. Results Here, we have established a pipeline (CM-line) with the following features for analyzing the large genomes of humans and other animals. First, we selected species with larger genetic distances to facilitate the discovery of covariations and compatible mutations. Second, we used CMfinder, which can generate useful alignments even with low sequence conservation. Third, we removed repetitive sequences and known structured ncRNAs to reduce the workload of CMfinder. Fourth, we used Infernal to find more representatives and refine the structure. We reported 11 classes of structured ncRNA candidates with significant covariations in humans. Functional analysis showed that these ncRNAs may have variable functions. Some may regulate circadian clock genes through poly (A) signals (PAS); some may regulate the elongation factor (EEF1A) and the T-cell receptor signaling pathway by cooperating with RNA binding proteins. Conclusions By searching for important features of RNA structure from large genomes, the CM-line has revealed the existence of a variety of novel structured ncRNAs. Functional analysis suggests that some newly discovered ncRNA motifs may have biological functions. The pipeline we have established for the discovery of structured ncRNAs and the identification of their functions can also be applied to analyze other large genomes.https://doi.org/10.1186/s12864-021-07474-9Comparative genomicsStructured ncRNAsHuman genomesAnimal genomesPipeline