We suggest a new approach, which is applicable for general statistics computed from random samples of univariate or vector-valued or functional data, to assessing the influence that individual data have on the value of a statistic, and to ranking the data in terms of that influence. Our method is based on, first, perturbing the value of the statistic by ‘tilting’, or reweighting, each data value, where the total amount of tilt is constrained to be the least possible, subject to achieving a given small perturbation of the statistic, and, then, taking the ranking of the influence of data values to be that which corresponds to ranking the changes in data weights. It is shown, both theoretically and numerically, that this ranking does not depend on the size of the perturbation, provided that the perturbation is sufficiently small. That simple result leads directly to an elegant geometric interpretation of the ranks; they are the ranks of the lengths of projections of the weights onto a ‘line’ determined by the first empirical principal component function in a generalized measure of covariance. To illustrate the generality of the method we introduce and explore it in the case of functional data, where (for example) it leads to generalized boxplots. The method has the advantage of providing an interpretable ranking that depends on the statistic under consideration. For example, the ranking of data, in terms of their influence on the value of a statistic, is different for a measure of location and for a measure of scale. This is as it should be; a ranking of data in terms of their influence should depend on the manner in which the data are used. Additionally, the ranking recognizes, rather than ignores, sign, and in particular can identify left- and right-hand ‘tails’ of the distribution of a random function or vector.
|Number of pages
|Journal of the Royal Statistical Society: Series B (Statistical Methodology)
|Published - Dec 26 2014
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty