Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing Data
Metagenomics studies, as well as genomics studies of polyploid species such as wheat, deal with the analysis of high variation data. Such data contain sequences from similar, but distinct genetic chains. This fact presents an obstacle to analysis and research. In particular, the detection of instrum...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-03-01
|
Series: | Information |
Subjects: | |
Online Access: | http://www.mdpi.com/2078-2489/10/3/110 |
id |
doaj-f92df43a78bb48ef84e956aa5c10f5a0 |
---|---|
record_format |
Article |
spelling |
doaj-f92df43a78bb48ef84e956aa5c10f5a02020-11-25T00:32:57ZengMDPI AGInformation2078-24892019-03-0110311010.3390/info10030110info10030110Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing DataMilko Krachunov0Maria Nisheva1Dimitar Vassilev2Faculty of Mathematics and Informatics, University of Sofia “St. Kliment Ohridski”, 5 James Bourchier Blvd., 1164 Sofia, BulgariaFaculty of Mathematics and Informatics, University of Sofia “St. Kliment Ohridski”, 5 James Bourchier Blvd., 1164 Sofia, BulgariaFaculty of Mathematics and Informatics, University of Sofia “St. Kliment Ohridski”, 5 James Bourchier Blvd., 1164 Sofia, BulgariaMetagenomics studies, as well as genomics studies of polyploid species such as wheat, deal with the analysis of high variation data. Such data contain sequences from similar, but distinct genetic chains. This fact presents an obstacle to analysis and research. In particular, the detection of instrumentation errors during the digitalization of the sequences may be hindered, as they can be indistinguishable from the real biological variation inside the digital data. This can prevent the determination of the correct sequences, while at the same time make variant studies significantly more difficult. This paper details a collection of ML-based models used to distinguish a real variant from an erroneous one. The focus is on using this model directly, but experiments are also done in combination with other predictors that isolate a pool of error candidates.http://www.mdpi.com/2078-2489/10/3/110machine learningneural networkNGS errorsmetagenomicspolyploid genomes |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Milko Krachunov Maria Nisheva Dimitar Vassilev |
spellingShingle |
Milko Krachunov Maria Nisheva Dimitar Vassilev Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing Data Information machine learning neural network NGS errors metagenomics polyploid genomes |
author_facet |
Milko Krachunov Maria Nisheva Dimitar Vassilev |
author_sort |
Milko Krachunov |
title |
Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing Data |
title_short |
Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing Data |
title_full |
Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing Data |
title_fullStr |
Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing Data |
title_full_unstemmed |
Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing Data |
title_sort |
machine learning models for error detection in metagenomics and polyploid sequencing data |
publisher |
MDPI AG |
series |
Information |
issn |
2078-2489 |
publishDate |
2019-03-01 |
description |
Metagenomics studies, as well as genomics studies of polyploid species such as wheat, deal with the analysis of high variation data. Such data contain sequences from similar, but distinct genetic chains. This fact presents an obstacle to analysis and research. In particular, the detection of instrumentation errors during the digitalization of the sequences may be hindered, as they can be indistinguishable from the real biological variation inside the digital data. This can prevent the determination of the correct sequences, while at the same time make variant studies significantly more difficult. This paper details a collection of ML-based models used to distinguish a real variant from an erroneous one. The focus is on using this model directly, but experiments are also done in combination with other predictors that isolate a pool of error candidates. |
topic |
machine learning neural network NGS errors metagenomics polyploid genomes |
url |
http://www.mdpi.com/2078-2489/10/3/110 |
work_keys_str_mv |
AT milkokrachunov machinelearningmodelsforerrordetectioninmetagenomicsandpolyploidsequencingdata AT marianisheva machinelearningmodelsforerrordetectioninmetagenomicsandpolyploidsequencingdata AT dimitarvassilev machinelearningmodelsforerrordetectioninmetagenomicsandpolyploidsequencingdata |
_version_ |
1725318068184285184 |