Discriminative motif discovery via simulated evolution and random under-sampling.

Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting...

Full description

Bibliographic Details
Main Authors: Tao Song, Hong Gu
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3923751?pdf=render
id doaj-410d9900a63140bd9e4075ba5ddd9311
record_format Article
spelling doaj-410d9900a63140bd9e4075ba5ddd93112020-11-25T01:23:07ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0192e8767010.1371/journal.pone.0087670Discriminative motif discovery via simulated evolution and random under-sampling.Tao SongHong GuConserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.http://europepmc.org/articles/PMC3923751?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Tao Song
Hong Gu
spellingShingle Tao Song
Hong Gu
Discriminative motif discovery via simulated evolution and random under-sampling.
PLoS ONE
author_facet Tao Song
Hong Gu
author_sort Tao Song
title Discriminative motif discovery via simulated evolution and random under-sampling.
title_short Discriminative motif discovery via simulated evolution and random under-sampling.
title_full Discriminative motif discovery via simulated evolution and random under-sampling.
title_fullStr Discriminative motif discovery via simulated evolution and random under-sampling.
title_full_unstemmed Discriminative motif discovery via simulated evolution and random under-sampling.
title_sort discriminative motif discovery via simulated evolution and random under-sampling.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2014-01-01
description Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.
url http://europepmc.org/articles/PMC3923751?pdf=render
work_keys_str_mv AT taosong discriminativemotifdiscoveryviasimulatedevolutionandrandomundersampling
AT honggu discriminativemotifdiscoveryviasimulatedevolutionandrandomundersampling
_version_ 1725123516049653760