TY - GEN
T1 - K-AP: Generating specified K clusters by efficient Affinity Propagation
AU - Zhang, Xiangliang
AU - Wang, Wei
AU - Nørvåg, Kjetil
AU - Sebag, Michèle
N1 - KAUST Repository Item: Exported on 2020-10-01
PY - 2010/12
Y1 - 2010/12
N2 - The Affinity Propagation (AP) clustering algorithm proposed by Frey and Dueck (2007) provides an understandable, nearly optimal summary of a data set. However, it suffers two major shortcomings: i) the number of clusters is vague with the user-defined parameter called self-confidence, and ii) the quadratic computational complexity. When aiming at a given number of clusters due to prior knowledge, AP has to be launched many times until an appropriate setting of self-confidence is found. The re-launched AP increases the computational cost by one order of magnitude. In this paper, we propose an algorithm, called K-AP, to exploit the immediate results of K clusters by introducing a constraint in the process of message passing. Through theoretical analysis and experimental validation, K-AP was shown to be able to directly generate K clusters as user defined, with a negligible increase of computational cost compared to AP. In the meanwhile, K-AP preserves the clustering quality as AP in terms of the distortion. K-AP is more effective than k-medoids w.r.t. the distortion minimization and higher clustering purity. © 2010 IEEE.
AB - The Affinity Propagation (AP) clustering algorithm proposed by Frey and Dueck (2007) provides an understandable, nearly optimal summary of a data set. However, it suffers two major shortcomings: i) the number of clusters is vague with the user-defined parameter called self-confidence, and ii) the quadratic computational complexity. When aiming at a given number of clusters due to prior knowledge, AP has to be launched many times until an appropriate setting of self-confidence is found. The re-launched AP increases the computational cost by one order of magnitude. In this paper, we propose an algorithm, called K-AP, to exploit the immediate results of K clusters by introducing a constraint in the process of message passing. Through theoretical analysis and experimental validation, K-AP was shown to be able to directly generate K clusters as user defined, with a negligible increase of computational cost compared to AP. In the meanwhile, K-AP preserves the clustering quality as AP in terms of the distortion. K-AP is more effective than k-medoids w.r.t. the distortion minimization and higher clustering purity. © 2010 IEEE.
UR - http://hdl.handle.net/10754/564322
UR - https://ieeexplore.ieee.org/document/5694106/
UR - http://www.scopus.com/inward/record.url?scp=79951743719&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2010.107
DO - 10.1109/ICDM.2010.107
M3 - Conference contribution
SN - 9780769542560
SP - 1187
EP - 1192
BT - 2010 IEEE International Conference on Data Mining
PB - Institute of Electrical and Electronics Engineers (IEEE)
ER -