Prediction of drought-resistant genes in Arabidopsis thaliana using SVM-RFE.

BACKGROUND: Identifying genes with essential roles in resisting environmental stress rates high in agronomic importance. Although massive DNA microarray gene expression data have been generated for plants, current computational approaches underutilize these data for studying genotype-trait relations...

Full description

Bibliographic Details
Main Authors: Yanchun Liang, Fan Zhang, Juexin Wang, Trupti Joshi, Yan Wang, Dong Xu
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2011-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3137602?pdf=render
id doaj-a16db63f85c049828a3a05e40a70f44a
record_format Article
spelling doaj-a16db63f85c049828a3a05e40a70f44a2020-11-25T02:00:17ZengPublic Library of Science (PLoS)PLoS ONE1932-62032011-01-0167e2175010.1371/journal.pone.0021750Prediction of drought-resistant genes in Arabidopsis thaliana using SVM-RFE.Yanchun LiangFan ZhangJuexin WangTrupti JoshiYan WangDong XuBACKGROUND: Identifying genes with essential roles in resisting environmental stress rates high in agronomic importance. Although massive DNA microarray gene expression data have been generated for plants, current computational approaches underutilize these data for studying genotype-trait relationships. Some advanced gene identification methods have been explored for human diseases, but typically these methods have not been converted into publicly available software tools and cannot be applied to plants for identifying genes with agronomic traits. METHODOLOGY: In this study, we used 22 sets of Arabidopsis thaliana gene expression data from GEO to predict the key genes involved in water tolerance. We applied an SVM-RFE (Support Vector Machine-Recursive Feature Elimination) feature selection method for the prediction. To address small sample sizes, we developed a modified approach for SVM-RFE by using bootstrapping and leave-one-out cross-validation. We also expanded our study to predict genes involved in water susceptibility. CONCLUSIONS: We analyzed the top 10 genes predicted to be involved in water tolerance. Seven of them are connected to known biological processes in drought resistance. We also analyzed the top 100 genes in terms of their biological functions. Our study shows that the SVM-RFE method is a highly promising method in analyzing plant microarray data for studying genotype-phenotype relationships. The software is freely available with source code at http://ccst.jlu.edu.cn/JCSB/RFET/.http://europepmc.org/articles/PMC3137602?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Yanchun Liang
Fan Zhang
Juexin Wang
Trupti Joshi
Yan Wang
Dong Xu
spellingShingle Yanchun Liang
Fan Zhang
Juexin Wang
Trupti Joshi
Yan Wang
Dong Xu
Prediction of drought-resistant genes in Arabidopsis thaliana using SVM-RFE.
PLoS ONE
author_facet Yanchun Liang
Fan Zhang
Juexin Wang
Trupti Joshi
Yan Wang
Dong Xu
author_sort Yanchun Liang
title Prediction of drought-resistant genes in Arabidopsis thaliana using SVM-RFE.
title_short Prediction of drought-resistant genes in Arabidopsis thaliana using SVM-RFE.
title_full Prediction of drought-resistant genes in Arabidopsis thaliana using SVM-RFE.
title_fullStr Prediction of drought-resistant genes in Arabidopsis thaliana using SVM-RFE.
title_full_unstemmed Prediction of drought-resistant genes in Arabidopsis thaliana using SVM-RFE.
title_sort prediction of drought-resistant genes in arabidopsis thaliana using svm-rfe.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2011-01-01
description BACKGROUND: Identifying genes with essential roles in resisting environmental stress rates high in agronomic importance. Although massive DNA microarray gene expression data have been generated for plants, current computational approaches underutilize these data for studying genotype-trait relationships. Some advanced gene identification methods have been explored for human diseases, but typically these methods have not been converted into publicly available software tools and cannot be applied to plants for identifying genes with agronomic traits. METHODOLOGY: In this study, we used 22 sets of Arabidopsis thaliana gene expression data from GEO to predict the key genes involved in water tolerance. We applied an SVM-RFE (Support Vector Machine-Recursive Feature Elimination) feature selection method for the prediction. To address small sample sizes, we developed a modified approach for SVM-RFE by using bootstrapping and leave-one-out cross-validation. We also expanded our study to predict genes involved in water susceptibility. CONCLUSIONS: We analyzed the top 10 genes predicted to be involved in water tolerance. Seven of them are connected to known biological processes in drought resistance. We also analyzed the top 100 genes in terms of their biological functions. Our study shows that the SVM-RFE method is a highly promising method in analyzing plant microarray data for studying genotype-phenotype relationships. The software is freely available with source code at http://ccst.jlu.edu.cn/JCSB/RFET/.
url http://europepmc.org/articles/PMC3137602?pdf=render
work_keys_str_mv AT yanchunliang predictionofdroughtresistantgenesinarabidopsisthalianausingsvmrfe
AT fanzhang predictionofdroughtresistantgenesinarabidopsisthalianausingsvmrfe
AT juexinwang predictionofdroughtresistantgenesinarabidopsisthalianausingsvmrfe
AT truptijoshi predictionofdroughtresistantgenesinarabidopsisthalianausingsvmrfe
AT yanwang predictionofdroughtresistantgenesinarabidopsisthalianausingsvmrfe
AT dongxu predictionofdroughtresistantgenesinarabidopsisthalianausingsvmrfe
_version_ 1724961543816216576