MAFCO: a compression tool for MAF files.

In the last decade, the cost of genomic sequencing has been decreasing so much that researchers all over the world accumulate huge amounts of data for present and future use. These genomic data need to be efficiently stored, because storage cost is not decreasing as fast as the cost of sequencing. I...

Full description

Bibliographic Details
Main Authors: Luís M O Matos, António J R Neves, Diogo Pratas, Armando J Pinho
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2015-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC4376647?pdf=render
id doaj-8b15a1505a8c45a3a67c1461234ec208
record_format Article
spelling doaj-8b15a1505a8c45a3a67c1461234ec2082020-11-24T21:27:22ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01103e011608210.1371/journal.pone.0116082MAFCO: a compression tool for MAF files.Luís M O MatosAntónio J R NevesDiogo PratasArmando J PinhoIn the last decade, the cost of genomic sequencing has been decreasing so much that researchers all over the world accumulate huge amounts of data for present and future use. These genomic data need to be efficiently stored, because storage cost is not decreasing as fast as the cost of sequencing. In order to overcome this problem, the most popular general-purpose compression tool, gzip, is usually used. However, these tools were not specifically designed to compress this kind of data, and often fall short when the intention is to reduce the data size as much as possible. There are several compression algorithms available, even for genomic data, but very few have been designed to deal with Whole Genome Alignments, containing alignments between entire genomes of several species. In this paper, we present a lossless compression tool, MAFCO, specifically designed to compress MAF (Multiple Alignment Format) files. Compared to gzip, the proposed tool attains a compression gain from 34% to 57%, depending on the data set. When compared to a recent dedicated method, which is not compatible with some data sets, the compression gain of MAFCO is about 9%. Both source-code and binaries for several operating systems are freely available for non-commercial use at: http://bioinformatics.ua.pt/software/mafco.http://europepmc.org/articles/PMC4376647?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Luís M O Matos
António J R Neves
Diogo Pratas
Armando J Pinho
spellingShingle Luís M O Matos
António J R Neves
Diogo Pratas
Armando J Pinho
MAFCO: a compression tool for MAF files.
PLoS ONE
author_facet Luís M O Matos
António J R Neves
Diogo Pratas
Armando J Pinho
author_sort Luís M O Matos
title MAFCO: a compression tool for MAF files.
title_short MAFCO: a compression tool for MAF files.
title_full MAFCO: a compression tool for MAF files.
title_fullStr MAFCO: a compression tool for MAF files.
title_full_unstemmed MAFCO: a compression tool for MAF files.
title_sort mafco: a compression tool for maf files.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2015-01-01
description In the last decade, the cost of genomic sequencing has been decreasing so much that researchers all over the world accumulate huge amounts of data for present and future use. These genomic data need to be efficiently stored, because storage cost is not decreasing as fast as the cost of sequencing. In order to overcome this problem, the most popular general-purpose compression tool, gzip, is usually used. However, these tools were not specifically designed to compress this kind of data, and often fall short when the intention is to reduce the data size as much as possible. There are several compression algorithms available, even for genomic data, but very few have been designed to deal with Whole Genome Alignments, containing alignments between entire genomes of several species. In this paper, we present a lossless compression tool, MAFCO, specifically designed to compress MAF (Multiple Alignment Format) files. Compared to gzip, the proposed tool attains a compression gain from 34% to 57%, depending on the data set. When compared to a recent dedicated method, which is not compatible with some data sets, the compression gain of MAFCO is about 9%. Both source-code and binaries for several operating systems are freely available for non-commercial use at: http://bioinformatics.ua.pt/software/mafco.
url http://europepmc.org/articles/PMC4376647?pdf=render
work_keys_str_mv AT luismomatos mafcoacompressiontoolformaffiles
AT antoniojrneves mafcoacompressiontoolformaffiles
AT diogopratas mafcoacompressiontoolformaffiles
AT armandojpinho mafcoacompressiontoolformaffiles
_version_ 1725975059780075520