TY - JOUR
T1 - An Optimized Scaffolding Algorithm for Unbalanced Sequencing
AU - Zhu, Wufei
AU - Liu, Ying
AU - Zhao, Ya
AU - Liao, Xingyu
AU - Tong, Mingxu
AU - Liao, Xingyu
N1 - KAUST Repository Item: Exported on 2023-06-06
Acknowledgements: This work has been supported by the National Natural Science Foundation of China under Grant: No. 62002388. The Health Technology and Development Research Plan of Yichang under Grant No. A12301-07, No. A13301-10.
PY - 2023/5/28
Y1 - 2023/5/28
N2 - Genome assembly based on NGS reads usually includes the following two steps: (1) Convert the reads into k-mers of a certain length, construct a de Bruijn graph according to the overlaps between the k-mers, and finally traverse paths in the graph to generate contigs. (2) Determine the linear order of contigs according to the alignment between the paired-end reads and the contigs generated in the first step, and construct scaffolds based on the order and relative positions of contigs. The second step is usually described as scaffolding. ScaffMatch is a classic scaffolding algorithm, which uses the number of paired-end reads that can be aligned to two contigs and whose pairwise distance equals the insert size as the weight of their edge in the scaffold graph, and removes the low-weight edges from the graph by setting the lowest weight threshold. However, due to the sequencing bias, there are few paired-end reads in the low sequencing depth regions, and ScaffMatch often ignores the connections between those reads, which significantly affects the effective size of the final assemblies. To overcome the shortcomings of ScaffMatch, we proposed an optimized scaffolding algorithm, called ScaffMatch-ud, based on ScaffMatch, and strategies of weight adjustment and new gap estimation. The experimental results show that the scaffolding performance of the proposed algorithm is better than other similar methods, which is more suitable for unbalanced sequencing.
AB - Genome assembly based on NGS reads usually includes the following two steps: (1) Convert the reads into k-mers of a certain length, construct a de Bruijn graph according to the overlaps between the k-mers, and finally traverse paths in the graph to generate contigs. (2) Determine the linear order of contigs according to the alignment between the paired-end reads and the contigs generated in the first step, and construct scaffolds based on the order and relative positions of contigs. The second step is usually described as scaffolding. ScaffMatch is a classic scaffolding algorithm, which uses the number of paired-end reads that can be aligned to two contigs and whose pairwise distance equals the insert size as the weight of their edge in the scaffold graph, and removes the low-weight edges from the graph by setting the lowest weight threshold. However, due to the sequencing bias, there are few paired-end reads in the low sequencing depth regions, and ScaffMatch often ignores the connections between those reads, which significantly affects the effective size of the final assemblies. To overcome the shortcomings of ScaffMatch, we proposed an optimized scaffolding algorithm, called ScaffMatch-ud, based on ScaffMatch, and strategies of weight adjustment and new gap estimation. The experimental results show that the scaffolding performance of the proposed algorithm is better than other similar methods, which is more suitable for unbalanced sequencing.
UR - http://hdl.handle.net/10754/692375
UR - https://link.springer.com/10.1007/s00354-023-00221-6
UR - http://www.scopus.com/inward/record.url?scp=85160402510&partnerID=8YFLogxK
U2 - 10.1007/s00354-023-00221-6
DO - 10.1007/s00354-023-00221-6
M3 - Article
SN - 0288-3635
JO - New Generation Computing
JF - New Generation Computing
ER -