StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis

Abstract Background Recently, a number of studies have been conducted to investigate how plants respond to stress at the cellular molecular level by measuring gene expression profiles over time. As a result, a set of time-series gene expression data for the stress response are available in databases...

Full description

Bibliographic Details
Main Authors: Dongwon Kang, Hongryul Ahn, Sangseon Lee, Chai-Jin Lee, Jihye Hur, Woosuk Jung, Sun Kim
Format: Article
Language:English
Published: BMC 2019-12-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-019-6283-z
id doaj-c0bbcc18c7c44cb09f65ae35735c06fe
record_format Article
spelling doaj-c0bbcc18c7c44cb09f65ae35735c06fe2020-12-20T12:16:00ZengBMCBMC Genomics1471-21642019-12-0120S1111310.1186/s12864-019-6283-zStressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsisDongwon Kang0Hongryul Ahn1Sangseon Lee2Chai-Jin Lee3Jihye Hur4Woosuk Jung5Sun Kim6Department of Computer Science and Engineering, Seoul National UniversityDepartment of Computer Science and Engineering, Seoul National UniversityDepartment of Computer Science and Engineering, Seoul National UniversityInterdisciplinary Program in Bioinformatics, Seoul National UniversityDepartment of Crop Science, Konkuk UniversityDepartment of Crop Science, Konkuk UniversityDepartment of Computer Science and Engineering, Seoul National UniversityAbstract Background Recently, a number of studies have been conducted to investigate how plants respond to stress at the cellular molecular level by measuring gene expression profiles over time. As a result, a set of time-series gene expression data for the stress response are available in databases. With the data, an integrated analysis of multiple stresses is possible, which identifies stress-responsive genes with higher specificity because considering multiple stress can capture the effect of interference between stresses. To analyze such data, a machine learning model needs to be built. Results In this study, we developed StressGenePred, a neural network-based machine learning method, to integrate time-series transcriptome data of multiple stress types. StressGenePred is designed to detect single stress-specific biomarker genes by using a simple feature embedding method, a twin neural network model, and Confident Multiple Choice Learning (CMCL) loss. The twin neural network model consists of a biomarker gene discovery and a stress type prediction model that share the same logical layer to reduce training complexity. The CMCL loss is used to make the twin model select biomarker genes that respond specifically to a single stress. In experiments using Arabidopsis gene expression data for four major environmental stresses, such as heat, cold, salt, and drought, StressGenePred classified the types of stress more accurately than the limma feature embedding method and the support vector machine and random forest classification methods. In addition, StressGenePred discovered known stress-related genes with higher specificity than the Fisher method. Conclusions StressGenePred is a machine learning method for identifying stress-related genes and predicting stress types for an integrated analysis of multiple stress time-series transcriptome data. This method can be used to other phenotype-gene associated studies.https://doi.org/10.1186/s12864-019-6283-zArabidopsisStressTranscriptomeTime-seriesMachine learning
collection DOAJ
language English
format Article
sources DOAJ
author Dongwon Kang
Hongryul Ahn
Sangseon Lee
Chai-Jin Lee
Jihye Hur
Woosuk Jung
Sun Kim
spellingShingle Dongwon Kang
Hongryul Ahn
Sangseon Lee
Chai-Jin Lee
Jihye Hur
Woosuk Jung
Sun Kim
StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis
BMC Genomics
Arabidopsis
Stress
Transcriptome
Time-series
Machine learning
author_facet Dongwon Kang
Hongryul Ahn
Sangseon Lee
Chai-Jin Lee
Jihye Hur
Woosuk Jung
Sun Kim
author_sort Dongwon Kang
title StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis
title_short StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis
title_full StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis
title_fullStr StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis
title_full_unstemmed StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis
title_sort stressgenepred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2019-12-01
description Abstract Background Recently, a number of studies have been conducted to investigate how plants respond to stress at the cellular molecular level by measuring gene expression profiles over time. As a result, a set of time-series gene expression data for the stress response are available in databases. With the data, an integrated analysis of multiple stresses is possible, which identifies stress-responsive genes with higher specificity because considering multiple stress can capture the effect of interference between stresses. To analyze such data, a machine learning model needs to be built. Results In this study, we developed StressGenePred, a neural network-based machine learning method, to integrate time-series transcriptome data of multiple stress types. StressGenePred is designed to detect single stress-specific biomarker genes by using a simple feature embedding method, a twin neural network model, and Confident Multiple Choice Learning (CMCL) loss. The twin neural network model consists of a biomarker gene discovery and a stress type prediction model that share the same logical layer to reduce training complexity. The CMCL loss is used to make the twin model select biomarker genes that respond specifically to a single stress. In experiments using Arabidopsis gene expression data for four major environmental stresses, such as heat, cold, salt, and drought, StressGenePred classified the types of stress more accurately than the limma feature embedding method and the support vector machine and random forest classification methods. In addition, StressGenePred discovered known stress-related genes with higher specificity than the Fisher method. Conclusions StressGenePred is a machine learning method for identifying stress-related genes and predicting stress types for an integrated analysis of multiple stress time-series transcriptome data. This method can be used to other phenotype-gene associated studies.
topic Arabidopsis
Stress
Transcriptome
Time-series
Machine learning
url https://doi.org/10.1186/s12864-019-6283-z
work_keys_str_mv AT dongwonkang stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis
AT hongryulahn stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis
AT sangseonlee stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis
AT chaijinlee stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis
AT jihyehur stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis
AT woosukjung stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis
AT sunkim stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis
_version_ 1724376863760973824