A comparative validation of the human variant simulator SIMdrom

The past decade’s progress in next generation sequencing has drastically decreased the price of whole genome and exome sequencing, making it available as a clinical tool for diagnosing patients with genetic disease. However, finding a disease-causing mutation among millions of non-pathogenic variant...

Full description

Bibliographic Details
Main Author: Ånäs, Sofia
Format: Others
Language:English
Published: 2017
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-328745
id ndltd-UPSALLA1-oai-DiVA.org-uu-328745
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-uu-3287452018-01-14T05:11:18ZA comparative validation of the human variant simulator SIMdromengÅnäs, Sofia2017Bioinformatics (Computational Biology)Bioinformatik (beräkningsbiologi)Engineering and TechnologyTeknik och teknologierThe past decade’s progress in next generation sequencing has drastically decreased the price of whole genome and exome sequencing, making it available as a clinical tool for diagnosing patients with genetic disease. However, finding a disease-causing mutation among millions of non-pathogenic variants in a patient’s genome, is not an easy task. Therefore, algorithms for finding variants relevant for clinicians to investigate more closely are needed and constantly developed. To test these algorithms a software called SIMdrom has been developed to simulate test data. In this project, the simulated data is validated through comparison to real genetic data to ensure that it is suitable to use as test data. Through ensuring the data’s reliability and finding possible improvements, the development of algorithms for finding disease-causing mutations can be facilitated. This in-turn could lead to better diagnosing-possibilities for clinicians. When visualizing simulated data together with real genomes using principal components analysis, it clusters near it’s real counterpart. This shows that the simulated data resembles the real genomes. Simulated exomes also performed well when used as a part in one of three training sets for the classifier in the Prioritization of Exome Data by Image Analysis study. Here they perform second best after an in-house data set consisting of real exomes. To conclude, the SIMdrom simulated data performs well in both parts of this project. Additional tests of its validity should include testing against larger real data sets, an improvement possibility could be to implement a simulation option for spiking in noise. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-328745UPTEC X ; 17 024application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Bioinformatics (Computational Biology)
Bioinformatik (beräkningsbiologi)
Engineering and Technology
Teknik och teknologier
spellingShingle Bioinformatics (Computational Biology)
Bioinformatik (beräkningsbiologi)
Engineering and Technology
Teknik och teknologier
Ånäs, Sofia
A comparative validation of the human variant simulator SIMdrom
description The past decade’s progress in next generation sequencing has drastically decreased the price of whole genome and exome sequencing, making it available as a clinical tool for diagnosing patients with genetic disease. However, finding a disease-causing mutation among millions of non-pathogenic variants in a patient’s genome, is not an easy task. Therefore, algorithms for finding variants relevant for clinicians to investigate more closely are needed and constantly developed. To test these algorithms a software called SIMdrom has been developed to simulate test data. In this project, the simulated data is validated through comparison to real genetic data to ensure that it is suitable to use as test data. Through ensuring the data’s reliability and finding possible improvements, the development of algorithms for finding disease-causing mutations can be facilitated. This in-turn could lead to better diagnosing-possibilities for clinicians. When visualizing simulated data together with real genomes using principal components analysis, it clusters near it’s real counterpart. This shows that the simulated data resembles the real genomes. Simulated exomes also performed well when used as a part in one of three training sets for the classifier in the Prioritization of Exome Data by Image Analysis study. Here they perform second best after an in-house data set consisting of real exomes. To conclude, the SIMdrom simulated data performs well in both parts of this project. Additional tests of its validity should include testing against larger real data sets, an improvement possibility could be to implement a simulation option for spiking in noise.
author Ånäs, Sofia
author_facet Ånäs, Sofia
author_sort Ånäs, Sofia
title A comparative validation of the human variant simulator SIMdrom
title_short A comparative validation of the human variant simulator SIMdrom
title_full A comparative validation of the human variant simulator SIMdrom
title_fullStr A comparative validation of the human variant simulator SIMdrom
title_full_unstemmed A comparative validation of the human variant simulator SIMdrom
title_sort comparative validation of the human variant simulator simdrom
publishDate 2017
url http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-328745
work_keys_str_mv AT anassofia acomparativevalidationofthehumanvariantsimulatorsimdrom
AT anassofia comparativevalidationofthehumanvariantsimulatorsimdrom
_version_ 1718610011605696512