TY - JOUR
T1 - A trainable clustering algorithm based on shortest paths from density peaks
AU - Pizzagalli, Diego Ulisse
AU - Gonzalez, Santiago Fernandez
AU - Krause, Rolf
N1 - Publisher Copyright:
Copyright © 2019 The Authors, some rights reserved;
PY - 2019/10/30
Y1 - 2019/10/30
N2 - Clustering is a technique to analyze empirical data, with a major application for biomedical research. Essentially, clustering finds groups of related points in a dataset. However, results depend on both metrics for point-to-point similarity and rules for point-to-group association. Non-appropriate metrics and rules can lead to artifacts, especially in case of multiple groups with heterogeneous structure. In this work, we propose a clustering algorithm that evaluates the properties of paths between points (rather than point-to-point similarity) and solves a global optimization problem, finding solutions not obtainable by methods relying on local choices. Moreover, our algorithm is trainable. Hence, it can be adapted and adopted for specific datasets and applications by providing examples of valid and invalid paths to train a path classifier. We demonstrate its applicability to identify heterogeneous groups in challenging synthetic datasets, segment highly nonconvex immune cells in confocal microscopy images, and classify arrhythmic heartbeats in electrocardiographic signals.
AB - Clustering is a technique to analyze empirical data, with a major application for biomedical research. Essentially, clustering finds groups of related points in a dataset. However, results depend on both metrics for point-to-point similarity and rules for point-to-group association. Non-appropriate metrics and rules can lead to artifacts, especially in case of multiple groups with heterogeneous structure. In this work, we propose a clustering algorithm that evaluates the properties of paths between points (rather than point-to-point similarity) and solves a global optimization problem, finding solutions not obtainable by methods relying on local choices. Moreover, our algorithm is trainable. Hence, it can be adapted and adopted for specific datasets and applications by providing examples of valid and invalid paths to train a path classifier. We demonstrate its applicability to identify heterogeneous groups in challenging synthetic datasets, segment highly nonconvex immune cells in confocal microscopy images, and classify arrhythmic heartbeats in electrocardiographic signals.
UR - http://www.scopus.com/inward/record.url?scp=85074604937&partnerID=8YFLogxK
U2 - 10.1126/sciadv.aax3770
DO - 10.1126/sciadv.aax3770
M3 - Article
C2 - 32195334
AN - SCOPUS:85074604937
SN - 2375-2548
VL - 5
JO - SCIENCE ADVANCES
JF - SCIENCE ADVANCES
IS - 10
M1 - eaax3770
ER -