A Simple Guideline to Assess the Characteristics of RNA-Seq Data

Next-generation sequencing (NGS) techniques have been used to generate various molecular maps including genomes, epigenomes, and transcriptomes. Transcriptomes from a given cell population can be profiled via RNA-seq. However, there is no simple way to assess the characteristics of RNA-seq data syst...

Full description

Bibliographic Details
Main Authors: Keunhong Son, Sungryul Yu, Wonseok Shin, Kyudong Han, Keunsoo Kang
Format: Article
Language:English
Published: Hindawi Limited 2018-01-01
Series:BioMed Research International
Online Access:http://dx.doi.org/10.1155/2018/2906292
id doaj-a48fd370db074128bea2976bbe5dae7a
record_format Article
spelling doaj-a48fd370db074128bea2976bbe5dae7a2020-11-24T21:48:27ZengHindawi LimitedBioMed Research International2314-61332314-61412018-01-01201810.1155/2018/29062922906292A Simple Guideline to Assess the Characteristics of RNA-Seq DataKeunhong Son0Sungryul Yu1Wonseok Shin2Kyudong Han3Keunsoo Kang4Department of Microbiology, College of Natural Sciences, Dankook University, Cheonan 31116, Republic of KoreaDepartment of Clinical Laboratory Science, Semyung University, Jecheon 27136, Republic of KoreaDepartment of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan 31116, Republic of KoreaDepartment of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan 31116, Republic of KoreaDepartment of Microbiology, College of Natural Sciences, Dankook University, Cheonan 31116, Republic of KoreaNext-generation sequencing (NGS) techniques have been used to generate various molecular maps including genomes, epigenomes, and transcriptomes. Transcriptomes from a given cell population can be profiled via RNA-seq. However, there is no simple way to assess the characteristics of RNA-seq data systematically. In this study, we provide a simple method that can intuitively evaluate RNA-seq data using two different principal component analysis (PCA) plots. The gene expression PCA plot provides insights into the association between samples, while the transcript integrity number (TIN) score plot provides a quality map of given RNA-seq data. With this approach, we found that RNA-seq datasets deposited in public repositories often contain a few low-quality RNA-seq data that can lead to misinterpretations. The effect of sampling errors for differentially expressed gene (DEG) analysis was evaluated with ten RNA-seq data from invasive ductal carcinoma tissues and three RNA-seq data from adjacent normal tissues taken from a Korean breast cancer patient. The evaluation demonstrated that sampling errors, which select samples that do not represent a given population, can lead to different interpretations when conducting the DEG analysis. Therefore, the proposed approach can be used to avoid sampling errors prior to RNA-seq data analysis.http://dx.doi.org/10.1155/2018/2906292
collection DOAJ
language English
format Article
sources DOAJ
author Keunhong Son
Sungryul Yu
Wonseok Shin
Kyudong Han
Keunsoo Kang
spellingShingle Keunhong Son
Sungryul Yu
Wonseok Shin
Kyudong Han
Keunsoo Kang
A Simple Guideline to Assess the Characteristics of RNA-Seq Data
BioMed Research International
author_facet Keunhong Son
Sungryul Yu
Wonseok Shin
Kyudong Han
Keunsoo Kang
author_sort Keunhong Son
title A Simple Guideline to Assess the Characteristics of RNA-Seq Data
title_short A Simple Guideline to Assess the Characteristics of RNA-Seq Data
title_full A Simple Guideline to Assess the Characteristics of RNA-Seq Data
title_fullStr A Simple Guideline to Assess the Characteristics of RNA-Seq Data
title_full_unstemmed A Simple Guideline to Assess the Characteristics of RNA-Seq Data
title_sort simple guideline to assess the characteristics of rna-seq data
publisher Hindawi Limited
series BioMed Research International
issn 2314-6133
2314-6141
publishDate 2018-01-01
description Next-generation sequencing (NGS) techniques have been used to generate various molecular maps including genomes, epigenomes, and transcriptomes. Transcriptomes from a given cell population can be profiled via RNA-seq. However, there is no simple way to assess the characteristics of RNA-seq data systematically. In this study, we provide a simple method that can intuitively evaluate RNA-seq data using two different principal component analysis (PCA) plots. The gene expression PCA plot provides insights into the association between samples, while the transcript integrity number (TIN) score plot provides a quality map of given RNA-seq data. With this approach, we found that RNA-seq datasets deposited in public repositories often contain a few low-quality RNA-seq data that can lead to misinterpretations. The effect of sampling errors for differentially expressed gene (DEG) analysis was evaluated with ten RNA-seq data from invasive ductal carcinoma tissues and three RNA-seq data from adjacent normal tissues taken from a Korean breast cancer patient. The evaluation demonstrated that sampling errors, which select samples that do not represent a given population, can lead to different interpretations when conducting the DEG analysis. Therefore, the proposed approach can be used to avoid sampling errors prior to RNA-seq data analysis.
url http://dx.doi.org/10.1155/2018/2906292
work_keys_str_mv AT keunhongson asimpleguidelinetoassessthecharacteristicsofrnaseqdata
AT sungryulyu asimpleguidelinetoassessthecharacteristicsofrnaseqdata
AT wonseokshin asimpleguidelinetoassessthecharacteristicsofrnaseqdata
AT kyudonghan asimpleguidelinetoassessthecharacteristicsofrnaseqdata
AT keunsookang asimpleguidelinetoassessthecharacteristicsofrnaseqdata
AT keunhongson simpleguidelinetoassessthecharacteristicsofrnaseqdata
AT sungryulyu simpleguidelinetoassessthecharacteristicsofrnaseqdata
AT wonseokshin simpleguidelinetoassessthecharacteristicsofrnaseqdata
AT kyudonghan simpleguidelinetoassessthecharacteristicsofrnaseqdata
AT keunsookang simpleguidelinetoassessthecharacteristicsofrnaseqdata
_version_ 1725891983360131072