StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis
Abstract Background Recently, a number of studies have been conducted to investigate how plants respond to stress at the cellular molecular level by measuring gene expression profiles over time. As a result, a set of time-series gene expression data for the stress response are available in databases...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-12-01
|
Series: | BMC Genomics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12864-019-6283-z |
id |
doaj-c0bbcc18c7c44cb09f65ae35735c06fe |
---|---|
record_format |
Article |
spelling |
doaj-c0bbcc18c7c44cb09f65ae35735c06fe2020-12-20T12:16:00ZengBMCBMC Genomics1471-21642019-12-0120S1111310.1186/s12864-019-6283-zStressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsisDongwon Kang0Hongryul Ahn1Sangseon Lee2Chai-Jin Lee3Jihye Hur4Woosuk Jung5Sun Kim6Department of Computer Science and Engineering, Seoul National UniversityDepartment of Computer Science and Engineering, Seoul National UniversityDepartment of Computer Science and Engineering, Seoul National UniversityInterdisciplinary Program in Bioinformatics, Seoul National UniversityDepartment of Crop Science, Konkuk UniversityDepartment of Crop Science, Konkuk UniversityDepartment of Computer Science and Engineering, Seoul National UniversityAbstract Background Recently, a number of studies have been conducted to investigate how plants respond to stress at the cellular molecular level by measuring gene expression profiles over time. As a result, a set of time-series gene expression data for the stress response are available in databases. With the data, an integrated analysis of multiple stresses is possible, which identifies stress-responsive genes with higher specificity because considering multiple stress can capture the effect of interference between stresses. To analyze such data, a machine learning model needs to be built. Results In this study, we developed StressGenePred, a neural network-based machine learning method, to integrate time-series transcriptome data of multiple stress types. StressGenePred is designed to detect single stress-specific biomarker genes by using a simple feature embedding method, a twin neural network model, and Confident Multiple Choice Learning (CMCL) loss. The twin neural network model consists of a biomarker gene discovery and a stress type prediction model that share the same logical layer to reduce training complexity. The CMCL loss is used to make the twin model select biomarker genes that respond specifically to a single stress. In experiments using Arabidopsis gene expression data for four major environmental stresses, such as heat, cold, salt, and drought, StressGenePred classified the types of stress more accurately than the limma feature embedding method and the support vector machine and random forest classification methods. In addition, StressGenePred discovered known stress-related genes with higher specificity than the Fisher method. Conclusions StressGenePred is a machine learning method for identifying stress-related genes and predicting stress types for an integrated analysis of multiple stress time-series transcriptome data. This method can be used to other phenotype-gene associated studies.https://doi.org/10.1186/s12864-019-6283-zArabidopsisStressTranscriptomeTime-seriesMachine learning |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Dongwon Kang Hongryul Ahn Sangseon Lee Chai-Jin Lee Jihye Hur Woosuk Jung Sun Kim |
spellingShingle |
Dongwon Kang Hongryul Ahn Sangseon Lee Chai-Jin Lee Jihye Hur Woosuk Jung Sun Kim StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis BMC Genomics Arabidopsis Stress Transcriptome Time-series Machine learning |
author_facet |
Dongwon Kang Hongryul Ahn Sangseon Lee Chai-Jin Lee Jihye Hur Woosuk Jung Sun Kim |
author_sort |
Dongwon Kang |
title |
StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis |
title_short |
StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis |
title_full |
StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis |
title_fullStr |
StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis |
title_full_unstemmed |
StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis |
title_sort |
stressgenepred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis |
publisher |
BMC |
series |
BMC Genomics |
issn |
1471-2164 |
publishDate |
2019-12-01 |
description |
Abstract Background Recently, a number of studies have been conducted to investigate how plants respond to stress at the cellular molecular level by measuring gene expression profiles over time. As a result, a set of time-series gene expression data for the stress response are available in databases. With the data, an integrated analysis of multiple stresses is possible, which identifies stress-responsive genes with higher specificity because considering multiple stress can capture the effect of interference between stresses. To analyze such data, a machine learning model needs to be built. Results In this study, we developed StressGenePred, a neural network-based machine learning method, to integrate time-series transcriptome data of multiple stress types. StressGenePred is designed to detect single stress-specific biomarker genes by using a simple feature embedding method, a twin neural network model, and Confident Multiple Choice Learning (CMCL) loss. The twin neural network model consists of a biomarker gene discovery and a stress type prediction model that share the same logical layer to reduce training complexity. The CMCL loss is used to make the twin model select biomarker genes that respond specifically to a single stress. In experiments using Arabidopsis gene expression data for four major environmental stresses, such as heat, cold, salt, and drought, StressGenePred classified the types of stress more accurately than the limma feature embedding method and the support vector machine and random forest classification methods. In addition, StressGenePred discovered known stress-related genes with higher specificity than the Fisher method. Conclusions StressGenePred is a machine learning method for identifying stress-related genes and predicting stress types for an integrated analysis of multiple stress time-series transcriptome data. This method can be used to other phenotype-gene associated studies. |
topic |
Arabidopsis Stress Transcriptome Time-series Machine learning |
url |
https://doi.org/10.1186/s12864-019-6283-z |
work_keys_str_mv |
AT dongwonkang stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis AT hongryulahn stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis AT sangseonlee stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis AT chaijinlee stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis AT jihyehur stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis AT woosukjung stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis AT sunkim stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis |
_version_ |
1724376863760973824 |