Machine learning methods for seasonal allergic rhinitis studies

Seasonal allergic rhinitis (SAR) is a disease caused by allergens from both environmental and genetic factors. Some researchers have studied the SAR based on traditional genetic methodologies. As technology develops, a new technique called single-cell RNA sequencing (scRNA-seq) is developed, which c...

Full description

Bibliographic Details
Main Author: Feng, Zijie
Format: Others
Language:English
Published: Linköpings universitet, Statistik och maskininlärning 2021
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-173090
id ndltd-UPSALLA1-oai-DiVA.org-liu-173090
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-liu-1730902021-02-23T05:28:41ZMachine learning methods for seasonal allergic rhinitis studiesengFeng, ZijieLinköpings universitet, Statistik och maskininlärning2021Machine learningSeasonal allergic rhinitisRandom forestPartial least squares discriminant analysisBootstrapCubic smoothing splinesSingle-cell RNA sequencingProbability Theory and StatisticsSannolikhetsteori och statistikSeasonal allergic rhinitis (SAR) is a disease caused by allergens from both environmental and genetic factors. Some researchers have studied the SAR based on traditional genetic methodologies. As technology develops, a new technique called single-cell RNA sequencing (scRNA-seq) is developed, which can generate high-dimension data. We apply two machine learning (ML) algorithms, random forest (RF) and partial least squares discriminant analysis (PLS-DA), for cell source classification and gene selection based on the SAR scRNA-seq time-series data from three allergic patients and four healthy controls denoised by single-cell variational inference (scVI). We additionally propose a new fitting method consisting of bootstrap and cubic smoothing splines to fit the averaged gene expressions per cell from different populations. To sum up, we find that both RF and PLS-DA could provide high classification accuracy, and RF is more preferable, considering its stable performance and strong gene-selection ability. Based on our analysis, there are 10 genes having discriminatory power to classify cells of allergic patients and healthy controls at any timepoints. Although there is no literature founded to show the direct connections between such 10 genes and SAR, the potential associations are indirectly confirmed by some studies. It shows a possibility that we can alarm allergic patients before a disease outbreak based on their genetic information. Meanwhile, our experiment results indicate that ML algorithms may discover something between genes and SAR compared with traditional techniques, which needs to be analyzed in genetics in the future. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-173090application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Machine learning
Seasonal allergic rhinitis
Random forest
Partial least squares discriminant analysis
Bootstrap
Cubic smoothing splines
Single-cell RNA sequencing
Probability Theory and Statistics
Sannolikhetsteori och statistik
spellingShingle Machine learning
Seasonal allergic rhinitis
Random forest
Partial least squares discriminant analysis
Bootstrap
Cubic smoothing splines
Single-cell RNA sequencing
Probability Theory and Statistics
Sannolikhetsteori och statistik
Feng, Zijie
Machine learning methods for seasonal allergic rhinitis studies
description Seasonal allergic rhinitis (SAR) is a disease caused by allergens from both environmental and genetic factors. Some researchers have studied the SAR based on traditional genetic methodologies. As technology develops, a new technique called single-cell RNA sequencing (scRNA-seq) is developed, which can generate high-dimension data. We apply two machine learning (ML) algorithms, random forest (RF) and partial least squares discriminant analysis (PLS-DA), for cell source classification and gene selection based on the SAR scRNA-seq time-series data from three allergic patients and four healthy controls denoised by single-cell variational inference (scVI). We additionally propose a new fitting method consisting of bootstrap and cubic smoothing splines to fit the averaged gene expressions per cell from different populations. To sum up, we find that both RF and PLS-DA could provide high classification accuracy, and RF is more preferable, considering its stable performance and strong gene-selection ability. Based on our analysis, there are 10 genes having discriminatory power to classify cells of allergic patients and healthy controls at any timepoints. Although there is no literature founded to show the direct connections between such 10 genes and SAR, the potential associations are indirectly confirmed by some studies. It shows a possibility that we can alarm allergic patients before a disease outbreak based on their genetic information. Meanwhile, our experiment results indicate that ML algorithms may discover something between genes and SAR compared with traditional techniques, which needs to be analyzed in genetics in the future.
author Feng, Zijie
author_facet Feng, Zijie
author_sort Feng, Zijie
title Machine learning methods for seasonal allergic rhinitis studies
title_short Machine learning methods for seasonal allergic rhinitis studies
title_full Machine learning methods for seasonal allergic rhinitis studies
title_fullStr Machine learning methods for seasonal allergic rhinitis studies
title_full_unstemmed Machine learning methods for seasonal allergic rhinitis studies
title_sort machine learning methods for seasonal allergic rhinitis studies
publisher Linköpings universitet, Statistik och maskininlärning
publishDate 2021
url http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-173090
work_keys_str_mv AT fengzijie machinelearningmethodsforseasonalallergicrhinitisstudies
_version_ 1719377975175020544