TY - JOUR
T1 - A universal framework for single-cell multi-omics data integration with graph convolutional networks
AU - Gao, Hongli
AU - Zhang, Bin
AU - Liu, Long
AU - Li, Shan
AU - Gao, Xin
AU - Yu, Bin
N1 - KAUST Repository Item: Exported on 2023-03-20
Acknowledged KAUST grant number(s): FCC/1/1976-44-01, FCC/1/1976-45-01, REI/1/4742-01-01, URF/1/4379-01-01
Acknowledgements: National Natural Science Foundation of China (62172248); Natural Science Foundation of Shandong Province of China (ZR2021MF098); King Abdullah University of Science and Technology (FCC/1/1976-44-01, FCC/1/1976-45-01, URF/1/4379-01-01 and REI/1/4742-01-01).
PY - 2023/3/17
Y1 - 2023/3/17
N2 - Single-cell omics data are growing at an unprecedented rate, whereas effective integration of them remains challenging due to different sequencing methods, quality, and expression pattern of each omics data. In this study, we propose a universal framework for the integration of single-cell multi-omics data based on graph convolutional network (GCN-SC). Among the multiple single-cell data, GCN-SC usually selects one data with the largest number of cells as the reference and the rest as the query dataset. It utilizes mutual nearest neighbor algorithm to identify cell-pairs, which provide connections between cells both within and across the reference and query datasets. A GCN algorithm further takes the mixed graph constructed from these cell-pairs to adjust count matrices from the query datasets. Finally, dimension reduction is performed by using non-negative matrix factorization before visualization. By applying GCN-SC on six datasets, we show that GCN-SC can effectively integrate sequencing data from multiple single-cell sequencing technologies, species or different omics, which outperforms the state-of-the-art methods, including Seurat, LIGER, GLUER and Pamona.
AB - Single-cell omics data are growing at an unprecedented rate, whereas effective integration of them remains challenging due to different sequencing methods, quality, and expression pattern of each omics data. In this study, we propose a universal framework for the integration of single-cell multi-omics data based on graph convolutional network (GCN-SC). Among the multiple single-cell data, GCN-SC usually selects one data with the largest number of cells as the reference and the rest as the query dataset. It utilizes mutual nearest neighbor algorithm to identify cell-pairs, which provide connections between cells both within and across the reference and query datasets. A GCN algorithm further takes the mixed graph constructed from these cell-pairs to adjust count matrices from the query datasets. Finally, dimension reduction is performed by using non-negative matrix factorization before visualization. By applying GCN-SC on six datasets, we show that GCN-SC can effectively integrate sequencing data from multiple single-cell sequencing technologies, species or different omics, which outperforms the state-of-the-art methods, including Seurat, LIGER, GLUER and Pamona.
UR - http://hdl.handle.net/10754/690408
UR - https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbad081/7079707
U2 - 10.1093/bib/bbad081
DO - 10.1093/bib/bbad081
M3 - Article
C2 - 36929841
SN - 1467-5463
JO - Briefings in bioinformatics
JF - Briefings in bioinformatics
ER -