TY - JOUR
T1 - Assembling metagenomes, one community at a time
AU - van der Walt, Andries Johannes
AU - van Goethem, Marc Warwick
AU - Ramond, Jean Baptiste
AU - Makhalanyane, Thulani Peter
AU - Reva, Oleg
AU - Cowan, Don Arthur
N1 - Generated from Scopus record by KAUST IRTS on 2023-10-23
PY - 2017/7/10
Y1 - 2017/7/10
N2 - Background: Metagenomics allows unprecedented access to uncultured environmental microorganisms. The analysis of metagenomic sequences facilitates gene prediction and annotation, and enables the assembly of draft genomes, including uncultured members of a community. However, while several platforms have been developed for this critical step, there is currently no clear framework for the assembly of metagenomic sequence data. Results: To assist with selection of an appropriate metagenome assembler we evaluated the capabilities of nine prominent assembly tools on nine publicly-available environmental metagenomes, as well as three simulated datasets. Overall, we found that SPAdes provided the largest contigs and highest N50 values across 6 of the 9 environmental datasets, followed by MEGAHIT and metaSPAdes. MEGAHIT emerged as a computationally inexpensive alternative to SPAdes, assembling the most complex dataset using less than 500 GB of RAM and within 10 hours. Conclusions: We found that assembler choice ultimately depends on the scientific question, the available resources and the bioinformatic competence of the researcher. We provide a concise workflow for the selection of the best assembly tool.
AB - Background: Metagenomics allows unprecedented access to uncultured environmental microorganisms. The analysis of metagenomic sequences facilitates gene prediction and annotation, and enables the assembly of draft genomes, including uncultured members of a community. However, while several platforms have been developed for this critical step, there is currently no clear framework for the assembly of metagenomic sequence data. Results: To assist with selection of an appropriate metagenome assembler we evaluated the capabilities of nine prominent assembly tools on nine publicly-available environmental metagenomes, as well as three simulated datasets. Overall, we found that SPAdes provided the largest contigs and highest N50 values across 6 of the 9 environmental datasets, followed by MEGAHIT and metaSPAdes. MEGAHIT emerged as a computationally inexpensive alternative to SPAdes, assembling the most complex dataset using less than 500 GB of RAM and within 10 hours. Conclusions: We found that assembler choice ultimately depends on the scientific question, the available resources and the bioinformatic competence of the researcher. We provide a concise workflow for the selection of the best assembly tool.
UR - http://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-017-3918-9
UR - http://www.scopus.com/inward/record.url?scp=85021978550&partnerID=8YFLogxK
U2 - 10.1186/s12864-017-3918-9
DO - 10.1186/s12864-017-3918-9
M3 - Article
SN - 1471-2164
VL - 18
JO - BMC genomics
JF - BMC genomics
IS - 1
ER -