A comparative validation of the human variant simulator SIMdrom
The past decade’s progress in next generation sequencing has drastically decreased the price of whole genome and exome sequencing, making it available as a clinical tool for diagnosing patients with genetic disease. However, finding a disease-causing mutation among millions of non-pathogenic variant...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-328745 |
id |
ndltd-UPSALLA1-oai-DiVA.org-uu-328745 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-uu-3287452018-01-14T05:11:18ZA comparative validation of the human variant simulator SIMdromengÅnäs, Sofia2017Bioinformatics (Computational Biology)Bioinformatik (beräkningsbiologi)Engineering and TechnologyTeknik och teknologierThe past decade’s progress in next generation sequencing has drastically decreased the price of whole genome and exome sequencing, making it available as a clinical tool for diagnosing patients with genetic disease. However, finding a disease-causing mutation among millions of non-pathogenic variants in a patient’s genome, is not an easy task. Therefore, algorithms for finding variants relevant for clinicians to investigate more closely are needed and constantly developed. To test these algorithms a software called SIMdrom has been developed to simulate test data. In this project, the simulated data is validated through comparison to real genetic data to ensure that it is suitable to use as test data. Through ensuring the data’s reliability and finding possible improvements, the development of algorithms for finding disease-causing mutations can be facilitated. This in-turn could lead to better diagnosing-possibilities for clinicians. When visualizing simulated data together with real genomes using principal components analysis, it clusters near it’s real counterpart. This shows that the simulated data resembles the real genomes. Simulated exomes also performed well when used as a part in one of three training sets for the classifier in the Prioritization of Exome Data by Image Analysis study. Here they perform second best after an in-house data set consisting of real exomes. To conclude, the SIMdrom simulated data performs well in both parts of this project. Additional tests of its validity should include testing against larger real data sets, an improvement possibility could be to implement a simulation option for spiking in noise. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-328745UPTEC X ; 17 024application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi) Engineering and Technology Teknik och teknologier |
spellingShingle |
Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi) Engineering and Technology Teknik och teknologier Ånäs, Sofia A comparative validation of the human variant simulator SIMdrom |
description |
The past decade’s progress in next generation sequencing has drastically decreased the price of whole genome and exome sequencing, making it available as a clinical tool for diagnosing patients with genetic disease. However, finding a disease-causing mutation among millions of non-pathogenic variants in a patient’s genome, is not an easy task. Therefore, algorithms for finding variants relevant for clinicians to investigate more closely are needed and constantly developed. To test these algorithms a software called SIMdrom has been developed to simulate test data. In this project, the simulated data is validated through comparison to real genetic data to ensure that it is suitable to use as test data. Through ensuring the data’s reliability and finding possible improvements, the development of algorithms for finding disease-causing mutations can be facilitated. This in-turn could lead to better diagnosing-possibilities for clinicians. When visualizing simulated data together with real genomes using principal components analysis, it clusters near it’s real counterpart. This shows that the simulated data resembles the real genomes. Simulated exomes also performed well when used as a part in one of three training sets for the classifier in the Prioritization of Exome Data by Image Analysis study. Here they perform second best after an in-house data set consisting of real exomes. To conclude, the SIMdrom simulated data performs well in both parts of this project. Additional tests of its validity should include testing against larger real data sets, an improvement possibility could be to implement a simulation option for spiking in noise. |
author |
Ånäs, Sofia |
author_facet |
Ånäs, Sofia |
author_sort |
Ånäs, Sofia |
title |
A comparative validation of the human variant simulator SIMdrom |
title_short |
A comparative validation of the human variant simulator SIMdrom |
title_full |
A comparative validation of the human variant simulator SIMdrom |
title_fullStr |
A comparative validation of the human variant simulator SIMdrom |
title_full_unstemmed |
A comparative validation of the human variant simulator SIMdrom |
title_sort |
comparative validation of the human variant simulator simdrom |
publishDate |
2017 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-328745 |
work_keys_str_mv |
AT anassofia acomparativevalidationofthehumanvariantsimulatorsimdrom AT anassofia comparativevalidationofthehumanvariantsimulatorsimdrom |
_version_ |
1718610011605696512 |