TY - JOUR
T1 - Reconstruction of microbial haplotypes by integration of statistical and physical linkage in scaffolding
AU - Cao, Chen
AU - He, Jingni
AU - Mak, Lauren
AU - Perera, Deshan
AU - Kwok, Devin
AU - Wang, Jia
AU - Li, Minghao
AU - Mourier, Tobias
AU - Gavriliuc, Stefan
AU - Greenberg, Matthew
AU - Morrissy, A Sorana
AU - Sycuro, Laura K
AU - YANG, GUANG
AU - Jeffares, Daniel C
AU - Long, Quan
N1 - KAUST Repository Item: Exported on 2021-02-11
Acknowledgements: Q.L. is supported by an NSERC Discovery Grant (RGPIN-2017-04860), a Canada Foundation for Innovation JELF grant (36605), and an ACHRI Startup grant. C.C., M.L. and L.M. are supported by ACHRI scholarship. L.M. is supported by a QEII award. G.Y. is supported by an NSERC Discovery Grant (RGPIN/04246-2018). Funding for open access charge: NSERC Discovery Grant (RGPIN-2017-04860).
PY - 2021/2/6
Y1 - 2021/2/6
N2 - Abstract
DNA sequencing technologies provide unprecedented opportunities to analyze within-host evolution of microorganism populations. Often, within-host populations are analyzed via pooled sequencing of the population, which contains multiple individuals or ‘haplotypes’. However, current next-generation sequencing instruments, in conjunction with single-molecule barcoded linked-reads, cannot distinguish long haplotypes directly. Computational reconstruction of haplotypes from pooled sequencing has been attempted in virology, bacterial genomics, metagenomics and human genetics, using algorithms based on either cross-host genetic sharing or within-host genomic reads. Here we describe PoolHapX, a flexible computational approach that integrates information from both genetic sharing and genomic sequencing. We demonstrated that PoolHapX outperforms state-of-the-art tools tailored to specific organismal systems, and is robust to within-host evolution. Importantly, together with barcoded linked-reads, PoolHapX can infer whole-chromosome-scale haplotypes from 50 pools each containing 12 different haplotypes. By analyzing real data, we uncovered dynamic variations in the evolutionary processes of within-patient HIV populations previously unobserved in single position-based analysis.
AB - Abstract
DNA sequencing technologies provide unprecedented opportunities to analyze within-host evolution of microorganism populations. Often, within-host populations are analyzed via pooled sequencing of the population, which contains multiple individuals or ‘haplotypes’. However, current next-generation sequencing instruments, in conjunction with single-molecule barcoded linked-reads, cannot distinguish long haplotypes directly. Computational reconstruction of haplotypes from pooled sequencing has been attempted in virology, bacterial genomics, metagenomics and human genetics, using algorithms based on either cross-host genetic sharing or within-host genomic reads. Here we describe PoolHapX, a flexible computational approach that integrates information from both genetic sharing and genomic sequencing. We demonstrated that PoolHapX outperforms state-of-the-art tools tailored to specific organismal systems, and is robust to within-host evolution. Importantly, together with barcoded linked-reads, PoolHapX can infer whole-chromosome-scale haplotypes from 50 pools each containing 12 different haplotypes. By analyzing real data, we uncovered dynamic variations in the evolutionary processes of within-patient HIV populations previously unobserved in single position-based analysis.
UR - http://hdl.handle.net/10754/667320
UR - https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msab037/6129778
U2 - 10.1093/molbev/msab037
DO - 10.1093/molbev/msab037
M3 - Article
C2 - 33547786
SN - 0737-4038
JO - Molecular Biology and Evolution
JF - Molecular Biology and Evolution
ER -