TY - JOUR
T1 - Challenges in the Multivariate Analysis of Mass Cytometry Data: The Effect of Randomization
AU - Papoutsoglou, Georgios
AU - Lagani, Vincenzo
AU - Schmidt, Angelika
AU - Tsirlis, Konstantinos
AU - Gomez-Cabrero, David
AU - Tegner, Jesper
AU - Tsamardinos, Ioannis
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: The authors acknowledge the SciLifeLab National Mass Cytometry Facility services in Stockholm for performing mass cytometry; particularly its members doctors. Tadepally Lakshmikanth, Yang Chen, Jaromir Mikes, and Petter Brodin for the many helpful discussions and feedback regarding sample preparation protocols and CyTOF data generation settings. In addition, the authors would like to thank the anonymous referees for their insightful comments and key suggestions during the review of this manuscript. The research leading to these results has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP/2007-2013)/ERC Grant Agreement no. 617393; CAUSALPATH – Next Generation Causal Analysis project. Funding for open access charge: ERC.
PY - 2019/11/6
Y1 - 2019/11/6
N2 - Cytometry by time-of-flight (CyTOF) has emerged as a high-throughput single cell technology able to provide large samples of protein readouts. Already, there exists a large pool of advanced high-dimensional analysis algorithms that explore the observed heterogeneous distributions making intriguing biological inferences. A fact largely overlooked by these methods, however, is the effect of the established data preprocessing pipeline to the distributions of the measured quantities. In this article, we focus on randomization, a transformation used for improving data visualization, which can negatively affect multivariate data analysis methods such as dimensionality reduction, clustering, and network reconstruction algorithms. Our results indicate that randomization should be used only for visualization purposes, but not in conjunction with high-dimensional analytical tools.
AB - Cytometry by time-of-flight (CyTOF) has emerged as a high-throughput single cell technology able to provide large samples of protein readouts. Already, there exists a large pool of advanced high-dimensional analysis algorithms that explore the observed heterogeneous distributions making intriguing biological inferences. A fact largely overlooked by these methods, however, is the effect of the established data preprocessing pipeline to the distributions of the measured quantities. In this article, we focus on randomization, a transformation used for improving data visualization, which can negatively affect multivariate data analysis methods such as dimensionality reduction, clustering, and network reconstruction algorithms. Our results indicate that randomization should be used only for visualization purposes, but not in conjunction with high-dimensional analytical tools.
UR - http://hdl.handle.net/10754/659980
UR - https://onlinelibrary.wiley.com/doi/abs/10.1002/cyto.a.23908
UR - http://www.scopus.com/inward/record.url?scp=85074780294&partnerID=8YFLogxK
U2 - 10.1002/cyto.a.23908
DO - 10.1002/cyto.a.23908
M3 - Article
C2 - 31692248
SN - 1552-4922
VL - 95
SP - 1178
EP - 1190
JO - Cytometry Part A
JF - Cytometry Part A
IS - 11
ER -