Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing Data

Metagenomics studies, as well as genomics studies of polyploid species such as wheat, deal with the analysis of high variation data. Such data contain sequences from similar, but distinct genetic chains. This fact presents an obstacle to analysis and research. In particular, the detection of instrum...

Full description

Bibliographic Details
Main Authors: Milko Krachunov, Maria Nisheva, Dimitar Vassilev
Format: Article
Language:English
Published: MDPI AG 2019-03-01
Series:Information
Subjects:
Online Access:http://www.mdpi.com/2078-2489/10/3/110
id doaj-f92df43a78bb48ef84e956aa5c10f5a0
record_format Article
spelling doaj-f92df43a78bb48ef84e956aa5c10f5a02020-11-25T00:32:57ZengMDPI AGInformation2078-24892019-03-0110311010.3390/info10030110info10030110Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing DataMilko Krachunov0Maria Nisheva1Dimitar Vassilev2Faculty of Mathematics and Informatics, University of Sofia “St. Kliment Ohridski”, 5 James Bourchier Blvd., 1164 Sofia, BulgariaFaculty of Mathematics and Informatics, University of Sofia “St. Kliment Ohridski”, 5 James Bourchier Blvd., 1164 Sofia, BulgariaFaculty of Mathematics and Informatics, University of Sofia “St. Kliment Ohridski”, 5 James Bourchier Blvd., 1164 Sofia, BulgariaMetagenomics studies, as well as genomics studies of polyploid species such as wheat, deal with the analysis of high variation data. Such data contain sequences from similar, but distinct genetic chains. This fact presents an obstacle to analysis and research. In particular, the detection of instrumentation errors during the digitalization of the sequences may be hindered, as they can be indistinguishable from the real biological variation inside the digital data. This can prevent the determination of the correct sequences, while at the same time make variant studies significantly more difficult. This paper details a collection of ML-based models used to distinguish a real variant from an erroneous one. The focus is on using this model directly, but experiments are also done in combination with other predictors that isolate a pool of error candidates.http://www.mdpi.com/2078-2489/10/3/110machine learningneural networkNGS errorsmetagenomicspolyploid genomes
collection DOAJ
language English
format Article
sources DOAJ
author Milko Krachunov
Maria Nisheva
Dimitar Vassilev
spellingShingle Milko Krachunov
Maria Nisheva
Dimitar Vassilev
Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing Data
Information
machine learning
neural network
NGS errors
metagenomics
polyploid genomes
author_facet Milko Krachunov
Maria Nisheva
Dimitar Vassilev
author_sort Milko Krachunov
title Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing Data
title_short Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing Data
title_full Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing Data
title_fullStr Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing Data
title_full_unstemmed Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing Data
title_sort machine learning models for error detection in metagenomics and polyploid sequencing data
publisher MDPI AG
series Information
issn 2078-2489
publishDate 2019-03-01
description Metagenomics studies, as well as genomics studies of polyploid species such as wheat, deal with the analysis of high variation data. Such data contain sequences from similar, but distinct genetic chains. This fact presents an obstacle to analysis and research. In particular, the detection of instrumentation errors during the digitalization of the sequences may be hindered, as they can be indistinguishable from the real biological variation inside the digital data. This can prevent the determination of the correct sequences, while at the same time make variant studies significantly more difficult. This paper details a collection of ML-based models used to distinguish a real variant from an erroneous one. The focus is on using this model directly, but experiments are also done in combination with other predictors that isolate a pool of error candidates.
topic machine learning
neural network
NGS errors
metagenomics
polyploid genomes
url http://www.mdpi.com/2078-2489/10/3/110
work_keys_str_mv AT milkokrachunov machinelearningmodelsforerrordetectioninmetagenomicsandpolyploidsequencingdata
AT marianisheva machinelearningmodelsforerrordetectioninmetagenomicsandpolyploidsequencingdata
AT dimitarvassilev machinelearningmodelsforerrordetectioninmetagenomicsandpolyploidsequencingdata
_version_ 1725318068184285184