TY - JOUR
T1 - scBKAP: a clustering model for single-cell RNA-seq data based on bisecting K-means
AU - Wang, Xiaolin
AU - Gao, Hongli
AU - Qi, Ren
AU - Zheng, Ruiqing
AU - Gao, Xin
AU - Yu, Bin
N1 - KAUST Repository Item: Exported on 2022-12-21
Acknowledged KAUST grant number(s): FCC/1/1976-17, FCC/1/1976-23, FCC/1/1976-26, REI/1/0018-01-01, REI/1/4473-01-01, URF/1/3412-01, URF/1/3450-01, URF/1/4098-01-01
Acknowledgements: We thank anonymous reviewers for valuable suggestions and comments. This work was supported by the National Natural Science Foundation of China (No. 62172248), the Natural Science Foundation of Shandong Province of China (No. ZR2021MF098), and the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. FCC/1/1976-17, FCC/1/1976-23, FCC/1/1976-26, URF/1/3450-01, URF/1/3412-01, URF/1/4098-01-01, REI/1/0018-01-01, and REI/1/4473-01-01.
PY - 2022/12/19
Y1 - 2022/12/19
N2 - Advances in single-cell RNA sequencing (scRNA-seq) technologies allow researchers to analyze the genome-wide transcription profile and to solve biological problems at the individual-cell resolution. However, existing clustering methods on scRNA-seq suffer from high dropout rate and curse of dimensionality in the data. Here, we propose a novel pipeline, scBKAP, the cornerstone of which is a single-cell bisecting K-means clustering method based on an autoencoder network and a dimensionality reduction model MPDR. Specially, scBKAP utilizes an autoencoder network to reconstruct gene expression values from scRNA-seq data to alleviate the dropout issue, and the MPDR model composed of the M3Drop feature selection algorithm and the PHATE dimensionality reduction algorithm to reduce the dimensions of reconstructed data. The dimensionality-reduced data are then fed into the bisecting K-means clustering algorithm to identify the clusters of cells. Comprehensive experiments demonstrate scBKAP's superior performance over nine state-of-the-art single-cell clustering methods on 21 public scRNA-seq datasets and simulated datasets.
AB - Advances in single-cell RNA sequencing (scRNA-seq) technologies allow researchers to analyze the genome-wide transcription profile and to solve biological problems at the individual-cell resolution. However, existing clustering methods on scRNA-seq suffer from high dropout rate and curse of dimensionality in the data. Here, we propose a novel pipeline, scBKAP, the cornerstone of which is a single-cell bisecting K-means clustering method based on an autoencoder network and a dimensionality reduction model MPDR. Specially, scBKAP utilizes an autoencoder network to reconstruct gene expression values from scRNA-seq data to alleviate the dropout issue, and the MPDR model composed of the M3Drop feature selection algorithm and the PHATE dimensionality reduction algorithm to reduce the dimensions of reconstructed data. The dimensionality-reduced data are then fed into the bisecting K-means clustering algorithm to identify the clusters of cells. Comprehensive experiments demonstrate scBKAP's superior performance over nine state-of-the-art single-cell clustering methods on 21 public scRNA-seq datasets and simulated datasets.
UR - http://hdl.handle.net/10754/686568
UR - https://ieeexplore.ieee.org/document/9991252/
U2 - 10.1109/TCBB.2022.3230098
DO - 10.1109/TCBB.2022.3230098
M3 - Article
C2 - 37015596
SN - 2374-0043
SP - 1
EP - 10
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
ER -