An Optimized Scaffolding Algorithm for Unbalanced Sequencing

Wufei Zhu, Ying Liu, Ya Zhao, Xingyu Liao, Mingxu Tong, Xingyu Liao

Research output: Contribution to journalArticlepeer-review


Genome assembly based on NGS reads usually includes the following two steps: (1) Convert the reads into k-mers of a certain length, construct a de Bruijn graph according to the overlaps between the k-mers, and finally traverse paths in the graph to generate contigs. (2) Determine the linear order of contigs according to the alignment between the paired-end reads and the contigs generated in the first step, and construct scaffolds based on the order and relative positions of contigs. The second step is usually described as scaffolding. ScaffMatch is a classic scaffolding algorithm, which uses the number of paired-end reads that can be aligned to two contigs and whose pairwise distance equals the insert size as the weight of their edge in the scaffold graph, and removes the low-weight edges from the graph by setting the lowest weight threshold. However, due to the sequencing bias, there are few paired-end reads in the low sequencing depth regions, and ScaffMatch often ignores the connections between those reads, which significantly affects the effective size of the final assemblies. To overcome the shortcomings of ScaffMatch, we proposed an optimized scaffolding algorithm, called ScaffMatch-ud, based on ScaffMatch, and strategies of weight adjustment and new gap estimation. The experimental results show that the scaffolding performance of the proposed algorithm is better than other similar methods, which is more suitable for unbalanced sequencing.
Original languageEnglish (US)
JournalNew Generation Computing
StatePublished - May 28 2023


Dive into the research topics of 'An Optimized Scaffolding Algorithm for Unbalanced Sequencing'. Together they form a unique fingerprint.

Cite this