TY - JOUR
T1 - Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data
AU - Allam, Amin
AU - Kalnis, Panos
AU - Solovyev, Victor
N1 - KAUST Repository Item: Exported on 2020-10-01
PY - 2015/7/14
Y1 - 2015/7/14
N2 - Motivation: Next-generation sequencing generates large amounts of data affected by errors in the form of substitutions, insertions or deletions of bases. Error correction based on the high-coverage information, typically improves de novo assembly. Most existing tools can correct substitution errors only; some support insertions and deletions, but accuracy in many cases is low.
Results: We present Karect, a novel error correction technique based on multiple alignment. Our approach supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome. Experiments with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate that Karect is more accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain) and post de novo assembly quality (up to 10% increase in NGA50). We also introduce an improved framework for evaluating the quality of error correction.
AB - Motivation: Next-generation sequencing generates large amounts of data affected by errors in the form of substitutions, insertions or deletions of bases. Error correction based on the high-coverage information, typically improves de novo assembly. Most existing tools can correct substitution errors only; some support insertions and deletions, but accuracy in many cases is low.
Results: We present Karect, a novel error correction technique based on multiple alignment. Our approach supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome. Experiments with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate that Karect is more accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain) and post de novo assembly quality (up to 10% increase in NGA50). We also introduce an improved framework for evaluating the quality of error correction.
UR - http://hdl.handle.net/10754/567063
UR - http://bioinformatics.oxfordjournals.org/lookup/doi/10.1093/bioinformatics/btv415
UR - http://www.scopus.com/inward/record.url?scp=84947574515&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btv415
DO - 10.1093/bioinformatics/btv415
M3 - Article
C2 - 26177965
SN - 1367-4803
VL - 31
SP - 3421
EP - 3428
JO - Bioinformatics
JF - Bioinformatics
IS - 21
ER -