Machine learning methods for seasonal allergic rhinitis studies
Seasonal allergic rhinitis (SAR) is a disease caused by allergens from both environmental and genetic factors. Some researchers have studied the SAR based on traditional genetic methodologies. As technology develops, a new technique called single-cell RNA sequencing (scRNA-seq) is developed, which c...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Linköpings universitet, Statistik och maskininlärning
2021
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-173090 |
id |
ndltd-UPSALLA1-oai-DiVA.org-liu-173090 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-liu-1730902021-02-23T05:28:41ZMachine learning methods for seasonal allergic rhinitis studiesengFeng, ZijieLinköpings universitet, Statistik och maskininlärning2021Machine learningSeasonal allergic rhinitisRandom forestPartial least squares discriminant analysisBootstrapCubic smoothing splinesSingle-cell RNA sequencingProbability Theory and StatisticsSannolikhetsteori och statistikSeasonal allergic rhinitis (SAR) is a disease caused by allergens from both environmental and genetic factors. Some researchers have studied the SAR based on traditional genetic methodologies. As technology develops, a new technique called single-cell RNA sequencing (scRNA-seq) is developed, which can generate high-dimension data. We apply two machine learning (ML) algorithms, random forest (RF) and partial least squares discriminant analysis (PLS-DA), for cell source classification and gene selection based on the SAR scRNA-seq time-series data from three allergic patients and four healthy controls denoised by single-cell variational inference (scVI). We additionally propose a new fitting method consisting of bootstrap and cubic smoothing splines to fit the averaged gene expressions per cell from different populations. To sum up, we find that both RF and PLS-DA could provide high classification accuracy, and RF is more preferable, considering its stable performance and strong gene-selection ability. Based on our analysis, there are 10 genes having discriminatory power to classify cells of allergic patients and healthy controls at any timepoints. Although there is no literature founded to show the direct connections between such 10 genes and SAR, the potential associations are indirectly confirmed by some studies. It shows a possibility that we can alarm allergic patients before a disease outbreak based on their genetic information. Meanwhile, our experiment results indicate that ML algorithms may discover something between genes and SAR compared with traditional techniques, which needs to be analyzed in genetics in the future. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-173090application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Machine learning Seasonal allergic rhinitis Random forest Partial least squares discriminant analysis Bootstrap Cubic smoothing splines Single-cell RNA sequencing Probability Theory and Statistics Sannolikhetsteori och statistik |
spellingShingle |
Machine learning Seasonal allergic rhinitis Random forest Partial least squares discriminant analysis Bootstrap Cubic smoothing splines Single-cell RNA sequencing Probability Theory and Statistics Sannolikhetsteori och statistik Feng, Zijie Machine learning methods for seasonal allergic rhinitis studies |
description |
Seasonal allergic rhinitis (SAR) is a disease caused by allergens from both environmental and genetic factors. Some researchers have studied the SAR based on traditional genetic methodologies. As technology develops, a new technique called single-cell RNA sequencing (scRNA-seq) is developed, which can generate high-dimension data. We apply two machine learning (ML) algorithms, random forest (RF) and partial least squares discriminant analysis (PLS-DA), for cell source classification and gene selection based on the SAR scRNA-seq time-series data from three allergic patients and four healthy controls denoised by single-cell variational inference (scVI). We additionally propose a new fitting method consisting of bootstrap and cubic smoothing splines to fit the averaged gene expressions per cell from different populations. To sum up, we find that both RF and PLS-DA could provide high classification accuracy, and RF is more preferable, considering its stable performance and strong gene-selection ability. Based on our analysis, there are 10 genes having discriminatory power to classify cells of allergic patients and healthy controls at any timepoints. Although there is no literature founded to show the direct connections between such 10 genes and SAR, the potential associations are indirectly confirmed by some studies. It shows a possibility that we can alarm allergic patients before a disease outbreak based on their genetic information. Meanwhile, our experiment results indicate that ML algorithms may discover something between genes and SAR compared with traditional techniques, which needs to be analyzed in genetics in the future. |
author |
Feng, Zijie |
author_facet |
Feng, Zijie |
author_sort |
Feng, Zijie |
title |
Machine learning methods for seasonal allergic rhinitis studies |
title_short |
Machine learning methods for seasonal allergic rhinitis studies |
title_full |
Machine learning methods for seasonal allergic rhinitis studies |
title_fullStr |
Machine learning methods for seasonal allergic rhinitis studies |
title_full_unstemmed |
Machine learning methods for seasonal allergic rhinitis studies |
title_sort |
machine learning methods for seasonal allergic rhinitis studies |
publisher |
Linköpings universitet, Statistik och maskininlärning |
publishDate |
2021 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-173090 |
work_keys_str_mv |
AT fengzijie machinelearningmethodsforseasonalallergicrhinitisstudies |
_version_ |
1719377975175020544 |