A Simple Guideline to Assess the Characteristics of RNA-Seq Data
Next-generation sequencing (NGS) techniques have been used to generate various molecular maps including genomes, epigenomes, and transcriptomes. Transcriptomes from a given cell population can be profiled via RNA-seq. However, there is no simple way to assess the characteristics of RNA-seq data syst...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2018-01-01
|
Series: | BioMed Research International |
Online Access: | http://dx.doi.org/10.1155/2018/2906292 |
id |
doaj-a48fd370db074128bea2976bbe5dae7a |
---|---|
record_format |
Article |
spelling |
doaj-a48fd370db074128bea2976bbe5dae7a2020-11-24T21:48:27ZengHindawi LimitedBioMed Research International2314-61332314-61412018-01-01201810.1155/2018/29062922906292A Simple Guideline to Assess the Characteristics of RNA-Seq DataKeunhong Son0Sungryul Yu1Wonseok Shin2Kyudong Han3Keunsoo Kang4Department of Microbiology, College of Natural Sciences, Dankook University, Cheonan 31116, Republic of KoreaDepartment of Clinical Laboratory Science, Semyung University, Jecheon 27136, Republic of KoreaDepartment of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan 31116, Republic of KoreaDepartment of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan 31116, Republic of KoreaDepartment of Microbiology, College of Natural Sciences, Dankook University, Cheonan 31116, Republic of KoreaNext-generation sequencing (NGS) techniques have been used to generate various molecular maps including genomes, epigenomes, and transcriptomes. Transcriptomes from a given cell population can be profiled via RNA-seq. However, there is no simple way to assess the characteristics of RNA-seq data systematically. In this study, we provide a simple method that can intuitively evaluate RNA-seq data using two different principal component analysis (PCA) plots. The gene expression PCA plot provides insights into the association between samples, while the transcript integrity number (TIN) score plot provides a quality map of given RNA-seq data. With this approach, we found that RNA-seq datasets deposited in public repositories often contain a few low-quality RNA-seq data that can lead to misinterpretations. The effect of sampling errors for differentially expressed gene (DEG) analysis was evaluated with ten RNA-seq data from invasive ductal carcinoma tissues and three RNA-seq data from adjacent normal tissues taken from a Korean breast cancer patient. The evaluation demonstrated that sampling errors, which select samples that do not represent a given population, can lead to different interpretations when conducting the DEG analysis. Therefore, the proposed approach can be used to avoid sampling errors prior to RNA-seq data analysis.http://dx.doi.org/10.1155/2018/2906292 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Keunhong Son Sungryul Yu Wonseok Shin Kyudong Han Keunsoo Kang |
spellingShingle |
Keunhong Son Sungryul Yu Wonseok Shin Kyudong Han Keunsoo Kang A Simple Guideline to Assess the Characteristics of RNA-Seq Data BioMed Research International |
author_facet |
Keunhong Son Sungryul Yu Wonseok Shin Kyudong Han Keunsoo Kang |
author_sort |
Keunhong Son |
title |
A Simple Guideline to Assess the Characteristics of RNA-Seq Data |
title_short |
A Simple Guideline to Assess the Characteristics of RNA-Seq Data |
title_full |
A Simple Guideline to Assess the Characteristics of RNA-Seq Data |
title_fullStr |
A Simple Guideline to Assess the Characteristics of RNA-Seq Data |
title_full_unstemmed |
A Simple Guideline to Assess the Characteristics of RNA-Seq Data |
title_sort |
simple guideline to assess the characteristics of rna-seq data |
publisher |
Hindawi Limited |
series |
BioMed Research International |
issn |
2314-6133 2314-6141 |
publishDate |
2018-01-01 |
description |
Next-generation sequencing (NGS) techniques have been used to generate various molecular maps including genomes, epigenomes, and transcriptomes. Transcriptomes from a given cell population can be profiled via RNA-seq. However, there is no simple way to assess the characteristics of RNA-seq data systematically. In this study, we provide a simple method that can intuitively evaluate RNA-seq data using two different principal component analysis (PCA) plots. The gene expression PCA plot provides insights into the association between samples, while the transcript integrity number (TIN) score plot provides a quality map of given RNA-seq data. With this approach, we found that RNA-seq datasets deposited in public repositories often contain a few low-quality RNA-seq data that can lead to misinterpretations. The effect of sampling errors for differentially expressed gene (DEG) analysis was evaluated with ten RNA-seq data from invasive ductal carcinoma tissues and three RNA-seq data from adjacent normal tissues taken from a Korean breast cancer patient. The evaluation demonstrated that sampling errors, which select samples that do not represent a given population, can lead to different interpretations when conducting the DEG analysis. Therefore, the proposed approach can be used to avoid sampling errors prior to RNA-seq data analysis. |
url |
http://dx.doi.org/10.1155/2018/2906292 |
work_keys_str_mv |
AT keunhongson asimpleguidelinetoassessthecharacteristicsofrnaseqdata AT sungryulyu asimpleguidelinetoassessthecharacteristicsofrnaseqdata AT wonseokshin asimpleguidelinetoassessthecharacteristicsofrnaseqdata AT kyudonghan asimpleguidelinetoassessthecharacteristicsofrnaseqdata AT keunsookang asimpleguidelinetoassessthecharacteristicsofrnaseqdata AT keunhongson simpleguidelinetoassessthecharacteristicsofrnaseqdata AT sungryulyu simpleguidelinetoassessthecharacteristicsofrnaseqdata AT wonseokshin simpleguidelinetoassessthecharacteristicsofrnaseqdata AT kyudonghan simpleguidelinetoassessthecharacteristicsofrnaseqdata AT keunsookang simpleguidelinetoassessthecharacteristicsofrnaseqdata |
_version_ |
1725891983360131072 |