TY - JOUR
T1 - Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space
AU - Xiong, Lei
AU - Tian, Kang
AU - Li, Yuzhe
AU - Ning, Weixi
AU - Gao, Xin
AU - Zhang, Qiangfeng Cliff
N1 - KAUST Repository Item: Exported on 2022-10-19
Acknowledged KAUST grant number(s): FCC/1/1976-44-01, FCC/1/1976-45-01, URF/1/4352-01-01, URF/1/4663-01-01
Acknowledgements: We thank Jianbin Wang, Jin Gu and Fuchou Tang for helpful comments and advice. This work is supported by the State Key Research Development Program of China (Grant No. 2019YFA0110002, Q.C.Z.), the National Natural Science Foundation of China (Grants No. 32125007 and 91940306, Q.C.Z.), the Beijing Advanced Innovation Center for Structural Biology, and the Tsinghua-Peking Joint Center for Life Sciences. We thank the Tsinghua University Branch of China National Center for Protein Sciences (Beijing) for computational facility support. This work is also supported by the King Abdullah University of Science and Technology (KAUST) Office of Research Administration (ORA) under Award No. FCC/1/1976-44-01, FCC/1/1976-45-01, URF/1/4352-01-01, and URF/1/4663-01-01 (X.G.).
PY - 2022/10/17
Y1 - 2022/10/17
N2 - Computational tools for integrative analyses of diverse single-cell experiments are facing formidable new challenges including dramatic increases in data scale, sample heterogeneity, and the need to informatively cross-reference new data with foundational datasets. Here, we present SCALEX, a deep-learning method that integrates single-cell data by projecting cells into a batch-invariant, common cell-embedding space in a truly online manner (i.e., without retraining the model). SCALEX substantially outperforms online iNMF and other state-of-the-art non-online integration methods on benchmark single-cell datasets of diverse modalities, (e.g., single-cell RNA sequencing, scRNA-seq, single-cell assay for transposase-accessible chromatin use sequencing, scATAC-seq), especially for datasets with partial overlaps, accurately aligning similar cell populations while retaining true biological differences. We showcase SCALEX’s advantages by constructing continuously expandable single-cell atlases for human, mouse, and COVID-19 patients, each assembled from diverse data sources and growing with every new data. The online data integration capacity and superior performance makes SCALEX particularly appropriate for large-scale single-cell applications to build upon previous scientific insights.
AB - Computational tools for integrative analyses of diverse single-cell experiments are facing formidable new challenges including dramatic increases in data scale, sample heterogeneity, and the need to informatively cross-reference new data with foundational datasets. Here, we present SCALEX, a deep-learning method that integrates single-cell data by projecting cells into a batch-invariant, common cell-embedding space in a truly online manner (i.e., without retraining the model). SCALEX substantially outperforms online iNMF and other state-of-the-art non-online integration methods on benchmark single-cell datasets of diverse modalities, (e.g., single-cell RNA sequencing, scRNA-seq, single-cell assay for transposase-accessible chromatin use sequencing, scATAC-seq), especially for datasets with partial overlaps, accurately aligning similar cell populations while retaining true biological differences. We showcase SCALEX’s advantages by constructing continuously expandable single-cell atlases for human, mouse, and COVID-19 patients, each assembled from diverse data sources and growing with every new data. The online data integration capacity and superior performance makes SCALEX particularly appropriate for large-scale single-cell applications to build upon previous scientific insights.
UR - http://hdl.handle.net/10754/683830
UR - https://www.nature.com/articles/s41467-022-33758-z
U2 - 10.1038/s41467-022-33758-z
DO - 10.1038/s41467-022-33758-z
M3 - Article
C2 - 36253379
SN - 2041-1723
VL - 13
JO - Nature Communications
JF - Nature Communications
IS - 1
ER -