Data Mining Methods for Omics and Knowledge of Crude Medicinal Plants toward Big Data Biology

Molecular biological data has rapidly increased with the recent progress of the Omics fields, e.g., genomics, transcriptomics, proteomics and metabolomics that necessitates the development of databases and methods for efficient storage, retrieval, integration and analysis of massive data. The presen...

Full description

Bibliographic Details
Main Authors: Farit M. Afendi, Naoaki Ono, Latifah K. Darusman, Kensuke Nakamura, Yukiko Nakamura, Nelson Kibinge, Aki Hirai Morita, Hisayuki Horai, Md. Altaf-Ul-Amin, Shigehiko Kanaya, Ken Tanaka
Format: Article
Language:English
Published: Elsevier 2013-01-01
Series:Computational and Structural Biotechnology Journal
Online Access:http://journals.sfu.ca/rncsb/index.php/csbj/article/view/csbj.201301010
id doaj-8d329981f26a46abb8454ec1288c1c4b
record_format Article
spelling doaj-8d329981f26a46abb8454ec1288c1c4b2020-11-24T20:54:25ZengElsevierComputational and Structural Biotechnology Journal2001-03702013-01-0145e201301010Data Mining Methods for Omics and Knowledge of Crude Medicinal Plants toward Big Data BiologyFarit M. AfendiNaoaki OnoLatifah K. DarusmanKensuke NakamuraYukiko NakamuraNelson KibingeAki Hirai MoritaHisayuki HoraiMd. Altaf-Ul-AminShigehiko KanayaKen TanakaMolecular biological data has rapidly increased with the recent progress of the Omics fields, e.g., genomics, transcriptomics, proteomics and metabolomics that necessitates the development of databases and methods for efficient storage, retrieval, integration and analysis of massive data. The present study reviews the usage of KNApSAcK Family DB in metabolomics and related area, discusses several statistical methods for handling multivariate data and shows their application on Indonesian blended herbal medicines (Jamu) as a case study. Exploration using Biplot reveals many plants are rarely utilized while some plants are highly utilized toward specific efficacy. Furthermore, the ingredients of Jamu formulas are modeled using Partial Least Squares Discriminant Analysis (PLS-DA) in order to predict their efficacy. The plants used in each Jamu medicine served as the predictors, whereas the efficacy of each Jamu provided the responses. This model produces 71.6% correct classification in predicting efficacy. Permutation test then is used to determine plants that serve as main ingredients in Jamu formula by evaluating the significance of the PLS-DA coefficients. Next, in order to explain the role of plants that serve as main ingredients in Jamu medicines, information of pharmacological activity of the plants is added to the predictor block. Then N-PLS-DA model, multiway version of PLS-DA, is utilized to handle the three-dimensional array of the predictor block. The resulting N-PLS-DA model reveals that the effects of some pharmacological activities are specific for certain efficacy and the other activities are diverse toward many efficacies. Mathematical modeling introduced in the present study can be utilized in global analysis of big data targeting to reveal the underlying biology.http://journals.sfu.ca/rncsb/index.php/csbj/article/view/csbj.201301010
collection DOAJ
language English
format Article
sources DOAJ
author Farit M. Afendi
Naoaki Ono
Latifah K. Darusman
Kensuke Nakamura
Yukiko Nakamura
Nelson Kibinge
Aki Hirai Morita
Hisayuki Horai
Md. Altaf-Ul-Amin
Shigehiko Kanaya
Ken Tanaka
spellingShingle Farit M. Afendi
Naoaki Ono
Latifah K. Darusman
Kensuke Nakamura
Yukiko Nakamura
Nelson Kibinge
Aki Hirai Morita
Hisayuki Horai
Md. Altaf-Ul-Amin
Shigehiko Kanaya
Ken Tanaka
Data Mining Methods for Omics and Knowledge of Crude Medicinal Plants toward Big Data Biology
Computational and Structural Biotechnology Journal
author_facet Farit M. Afendi
Naoaki Ono
Latifah K. Darusman
Kensuke Nakamura
Yukiko Nakamura
Nelson Kibinge
Aki Hirai Morita
Hisayuki Horai
Md. Altaf-Ul-Amin
Shigehiko Kanaya
Ken Tanaka
author_sort Farit M. Afendi
title Data Mining Methods for Omics and Knowledge of Crude Medicinal Plants toward Big Data Biology
title_short Data Mining Methods for Omics and Knowledge of Crude Medicinal Plants toward Big Data Biology
title_full Data Mining Methods for Omics and Knowledge of Crude Medicinal Plants toward Big Data Biology
title_fullStr Data Mining Methods for Omics and Knowledge of Crude Medicinal Plants toward Big Data Biology
title_full_unstemmed Data Mining Methods for Omics and Knowledge of Crude Medicinal Plants toward Big Data Biology
title_sort data mining methods for omics and knowledge of crude medicinal plants toward big data biology
publisher Elsevier
series Computational and Structural Biotechnology Journal
issn 2001-0370
publishDate 2013-01-01
description Molecular biological data has rapidly increased with the recent progress of the Omics fields, e.g., genomics, transcriptomics, proteomics and metabolomics that necessitates the development of databases and methods for efficient storage, retrieval, integration and analysis of massive data. The present study reviews the usage of KNApSAcK Family DB in metabolomics and related area, discusses several statistical methods for handling multivariate data and shows their application on Indonesian blended herbal medicines (Jamu) as a case study. Exploration using Biplot reveals many plants are rarely utilized while some plants are highly utilized toward specific efficacy. Furthermore, the ingredients of Jamu formulas are modeled using Partial Least Squares Discriminant Analysis (PLS-DA) in order to predict their efficacy. The plants used in each Jamu medicine served as the predictors, whereas the efficacy of each Jamu provided the responses. This model produces 71.6% correct classification in predicting efficacy. Permutation test then is used to determine plants that serve as main ingredients in Jamu formula by evaluating the significance of the PLS-DA coefficients. Next, in order to explain the role of plants that serve as main ingredients in Jamu medicines, information of pharmacological activity of the plants is added to the predictor block. Then N-PLS-DA model, multiway version of PLS-DA, is utilized to handle the three-dimensional array of the predictor block. The resulting N-PLS-DA model reveals that the effects of some pharmacological activities are specific for certain efficacy and the other activities are diverse toward many efficacies. Mathematical modeling introduced in the present study can be utilized in global analysis of big data targeting to reveal the underlying biology.
url http://journals.sfu.ca/rncsb/index.php/csbj/article/view/csbj.201301010
work_keys_str_mv AT faritmafendi dataminingmethodsforomicsandknowledgeofcrudemedicinalplantstowardbigdatabiology
AT naoakiono dataminingmethodsforomicsandknowledgeofcrudemedicinalplantstowardbigdatabiology
AT latifahkdarusman dataminingmethodsforomicsandknowledgeofcrudemedicinalplantstowardbigdatabiology
AT kensukenakamura dataminingmethodsforomicsandknowledgeofcrudemedicinalplantstowardbigdatabiology
AT yukikonakamura dataminingmethodsforomicsandknowledgeofcrudemedicinalplantstowardbigdatabiology
AT nelsonkibinge dataminingmethodsforomicsandknowledgeofcrudemedicinalplantstowardbigdatabiology
AT akihiraimorita dataminingmethodsforomicsandknowledgeofcrudemedicinalplantstowardbigdatabiology
AT hisayukihorai dataminingmethodsforomicsandknowledgeofcrudemedicinalplantstowardbigdatabiology
AT mdaltafulamin dataminingmethodsforomicsandknowledgeofcrudemedicinalplantstowardbigdatabiology
AT shigehikokanaya dataminingmethodsforomicsandknowledgeofcrudemedicinalplantstowardbigdatabiology
AT kentanaka dataminingmethodsforomicsandknowledgeofcrudemedicinalplantstowardbigdatabiology
_version_ 1716794605559611392