TY - JOUR
T1 - scAEGAN
T2 - Unification of single-cell genomics data by adversarial learning of latent space correspondences
AU - Khan, Sumeer Ahmad
AU - Lehmann, Robert
AU - Martinez-De-Morentin, Xabier
AU - Maillo, Alberto
AU - Lagani, Vincenzo
AU - Kiani, Narsis A.
AU - Gomez-Cabrero, David
AU - Tegner, Jesper
N1 - Funding Information:
This work was supported by the King Abdullah University of Science and Technology. The funders had no role in study design, data collection and analysis decision to publish, or preparation of the manuscript.
Publisher Copyright:
© 2023 Khan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2023/2
Y1 - 2023/2
N2 - Recent progress in Single-Cell Genomics has produced different library protocols and techniques for molecular profiling. We formulate a unifying, data-driven, integrative, and predictive methodology for different libraries, samples, and paired-unpaired data modalities. Our design of scAEGAN includes an autoencoder (AE) network integrated with adversarial learning by a cycleGAN (cGAN) network. The AE learns a low-dimensional embedding of each condition, whereas the cGAN learns a non-linear mapping between the AE representations. We evaluate scAEGAN using simulated data and real scRNA-seq datasets, different library preparations (Fluidigm C1, CelSeq, CelSeq2, SmartSeq), and several data modalities as paired scRNA-seq and scATAC-seq. The scAEGAN outperforms Seurat3 in library integration, is more robust against data sparsity, and beats Seurat 4 in integrating paired data from the same cell. Furthermore, in predicting one data modality from another, scAEGAN outperforms Babel. We conclude that scAEGAN surpasses current state-of-the-art methods and unifies integration and prediction challenges.
AB - Recent progress in Single-Cell Genomics has produced different library protocols and techniques for molecular profiling. We formulate a unifying, data-driven, integrative, and predictive methodology for different libraries, samples, and paired-unpaired data modalities. Our design of scAEGAN includes an autoencoder (AE) network integrated with adversarial learning by a cycleGAN (cGAN) network. The AE learns a low-dimensional embedding of each condition, whereas the cGAN learns a non-linear mapping between the AE representations. We evaluate scAEGAN using simulated data and real scRNA-seq datasets, different library preparations (Fluidigm C1, CelSeq, CelSeq2, SmartSeq), and several data modalities as paired scRNA-seq and scATAC-seq. The scAEGAN outperforms Seurat3 in library integration, is more robust against data sparsity, and beats Seurat 4 in integrating paired data from the same cell. Furthermore, in predicting one data modality from another, scAEGAN outperforms Babel. We conclude that scAEGAN surpasses current state-of-the-art methods and unifies integration and prediction challenges.
UR - http://www.scopus.com/inward/record.url?scp=85147457825&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0281315
DO - 10.1371/journal.pone.0281315
M3 - Article
C2 - 36735690
AN - SCOPUS:85147457825
SN - 1932-6203
VL - 18
JO - PloS one
JF - PloS one
IS - 2 February
M1 - e0281315
ER -