Semantically-aware statistical metrics via weighting kernels

Stefano Cresci, Roberto Di Pietro, Maurizio Tesconi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Distance metrics between statistical distributions are widely used as an efficient mean to aggregate/simplify the underlying probabilities, thus enabling high-level analyses. In this paper we investigate the collisions that can arise with such metrics, and a mitigation technique rooted on kernels. In detail, we first show that the existence of colliding functions (so-called iso-curves) is widespread across metrics and families of functions (e.g., gaussians, heavy-tailed). Later, we propose a solution based on kernels for augmenting distance metrics and summary statistics, thus avoiding collisions and highlighting semantically-relevant phenomena. This study is supported by a thorough theoretical evaluation of our solution against a large number of functions and metrics, complemented by a real-world evaluation carried out by applying our solution to an existing problem. Some further research venues are also discussed. The theoretical construction and the achieved results show the soundness, viability, and quality of our proposal that, other being interesting on its own, also paves the way for further research in the highlighted directions.
Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages51-60
Number of pages10
ISBN (Print)9781728144931
DOIs
StatePublished - Oct 1 2019
Externally publishedYes

Fingerprint

Dive into the research topics of 'Semantically-aware statistical metrics via weighting kernels'. Together they form a unique fingerprint.

Cite this