Prediction of Probable Major Depressive Disorder in the Taiwan Biobank: An Integrated Machine Learning and Genome-Wide Analysis Approach

In light of recent advancements in machine learning, personalized medicine using predictive algorithms serves as an essential paradigmatic methodology. Our goal was to explore an integrated machine learning and genome-wide analysis approach which targets the prediction of probable major depressive d...

Full description

Bibliographic Details
Main Authors: Eugene Lin, Po-Hsiu Kuo, Wan-Yu Lin, Yu-Li Liu, Albert C. Yang, Shih-Jen Tsai
Format: Article
Language:English
Published: MDPI AG 2021-06-01
Series:Journal of Personalized Medicine
Subjects:
Online Access:https://www.mdpi.com/2075-4426/11/7/597
id doaj-d67ff0c2e5ad46bba226af6c6b463a81
record_format Article
spelling doaj-d67ff0c2e5ad46bba226af6c6b463a812021-07-23T13:49:21ZengMDPI AGJournal of Personalized Medicine2075-44262021-06-011159759710.3390/jpm11070597Prediction of Probable Major Depressive Disorder in the Taiwan Biobank: An Integrated Machine Learning and Genome-Wide Analysis ApproachEugene Lin0Po-Hsiu Kuo1Wan-Yu Lin2Yu-Li Liu3Albert C. Yang4Shih-Jen Tsai5Department of Biostatistics, University of Washington, Seattle, WA 98195, USADepartment of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei 10617, TaiwanDepartment of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei 10617, TaiwanCenter for Neuropsychiatric Research, National Health Research Institutes, Miaoli County 35053, TaiwanDivision of Interdisciplinary Medicine and Biotechnology, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, MA 02215, USADepartment of Psychiatry, Taipei Veterans General Hospital, Taipei 11217, TaiwanIn light of recent advancements in machine learning, personalized medicine using predictive algorithms serves as an essential paradigmatic methodology. Our goal was to explore an integrated machine learning and genome-wide analysis approach which targets the prediction of probable major depressive disorder (MDD) using 9828 individuals in the Taiwan Biobank. In our analysis, we reported a genome-wide significant association with probable MDD that has not been previously identified: <i>FBN1</i> on chromosome 15. Furthermore, we pinpointed 17 single nucleotide polymorphisms (SNPs) which show evidence of both associations with probable MDD and potential roles as expression quantitative trait loci (eQTLs). To predict the status of probable MDD, we established prediction models with random undersampling and synthetic minority oversampling using 17 eQTL SNPs and eight clinical variables. We utilized five state-of-the-art models: logistic ridge regression, support vector machine, C4.5 decision tree, LogitBoost, and random forests. Our data revealed that random forests had the highest performance (area under curve = 0.8905 ± 0.0088; repeated 10-fold cross-validation) among the predictive algorithms to infer complex correlations between biomarkers and probable MDD. Our study suggests that an integrated machine learning and genome-wide analysis approach may offer an advantageous method to establish bioinformatics tools for discriminating MDD patients from healthy controls.https://www.mdpi.com/2075-4426/11/7/597genome-wide association studymachine learningmajor depressive disorderpersonalized medicinesingle nucleotide polymorphisms
collection DOAJ
language English
format Article
sources DOAJ
author Eugene Lin
Po-Hsiu Kuo
Wan-Yu Lin
Yu-Li Liu
Albert C. Yang
Shih-Jen Tsai
spellingShingle Eugene Lin
Po-Hsiu Kuo
Wan-Yu Lin
Yu-Li Liu
Albert C. Yang
Shih-Jen Tsai
Prediction of Probable Major Depressive Disorder in the Taiwan Biobank: An Integrated Machine Learning and Genome-Wide Analysis Approach
Journal of Personalized Medicine
genome-wide association study
machine learning
major depressive disorder
personalized medicine
single nucleotide polymorphisms
author_facet Eugene Lin
Po-Hsiu Kuo
Wan-Yu Lin
Yu-Li Liu
Albert C. Yang
Shih-Jen Tsai
author_sort Eugene Lin
title Prediction of Probable Major Depressive Disorder in the Taiwan Biobank: An Integrated Machine Learning and Genome-Wide Analysis Approach
title_short Prediction of Probable Major Depressive Disorder in the Taiwan Biobank: An Integrated Machine Learning and Genome-Wide Analysis Approach
title_full Prediction of Probable Major Depressive Disorder in the Taiwan Biobank: An Integrated Machine Learning and Genome-Wide Analysis Approach
title_fullStr Prediction of Probable Major Depressive Disorder in the Taiwan Biobank: An Integrated Machine Learning and Genome-Wide Analysis Approach
title_full_unstemmed Prediction of Probable Major Depressive Disorder in the Taiwan Biobank: An Integrated Machine Learning and Genome-Wide Analysis Approach
title_sort prediction of probable major depressive disorder in the taiwan biobank: an integrated machine learning and genome-wide analysis approach
publisher MDPI AG
series Journal of Personalized Medicine
issn 2075-4426
publishDate 2021-06-01
description In light of recent advancements in machine learning, personalized medicine using predictive algorithms serves as an essential paradigmatic methodology. Our goal was to explore an integrated machine learning and genome-wide analysis approach which targets the prediction of probable major depressive disorder (MDD) using 9828 individuals in the Taiwan Biobank. In our analysis, we reported a genome-wide significant association with probable MDD that has not been previously identified: <i>FBN1</i> on chromosome 15. Furthermore, we pinpointed 17 single nucleotide polymorphisms (SNPs) which show evidence of both associations with probable MDD and potential roles as expression quantitative trait loci (eQTLs). To predict the status of probable MDD, we established prediction models with random undersampling and synthetic minority oversampling using 17 eQTL SNPs and eight clinical variables. We utilized five state-of-the-art models: logistic ridge regression, support vector machine, C4.5 decision tree, LogitBoost, and random forests. Our data revealed that random forests had the highest performance (area under curve = 0.8905 ± 0.0088; repeated 10-fold cross-validation) among the predictive algorithms to infer complex correlations between biomarkers and probable MDD. Our study suggests that an integrated machine learning and genome-wide analysis approach may offer an advantageous method to establish bioinformatics tools for discriminating MDD patients from healthy controls.
topic genome-wide association study
machine learning
major depressive disorder
personalized medicine
single nucleotide polymorphisms
url https://www.mdpi.com/2075-4426/11/7/597
work_keys_str_mv AT eugenelin predictionofprobablemajordepressivedisorderinthetaiwanbiobankanintegratedmachinelearningandgenomewideanalysisapproach
AT pohsiukuo predictionofprobablemajordepressivedisorderinthetaiwanbiobankanintegratedmachinelearningandgenomewideanalysisapproach
AT wanyulin predictionofprobablemajordepressivedisorderinthetaiwanbiobankanintegratedmachinelearningandgenomewideanalysisapproach
AT yuliliu predictionofprobablemajordepressivedisorderinthetaiwanbiobankanintegratedmachinelearningandgenomewideanalysisapproach
AT albertcyang predictionofprobablemajordepressivedisorderinthetaiwanbiobankanintegratedmachinelearningandgenomewideanalysisapproach
AT shihjentsai predictionofprobablemajordepressivedisorderinthetaiwanbiobankanintegratedmachinelearningandgenomewideanalysisapproach
_version_ 1721287617402634240