Estimation of a probability density function using interval aggregated data

Jianhua Z. Huang, Xueying Wang, Ximing Wu, Lan Zhou

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


In economics and government statistics, aggregated data instead of individual level data are usually reported for data confidentiality and for simplicity. In this paper we develop a method of flexibly estimating the probability density function of the population using aggregated data obtained as group averages when individual level data are grouped according to quantile limits. The kernel density estimator has been commonly applied to such data without taking into account the data aggregation process and has been shown to perform poorly. Our method models the quantile function as an integral of the exponential of a spline function and deduces the density function from the quantile function. We match the aggregated data to their theoretical counterpart using least squares, and regularize the estimation by using the squared second derivatives of the density function as the penalty function. A computational algorithm is developed to implement the method. Application to simulated data and US household income survey data show that our penalized spline estimator can accurately recover the density function of the underlying population while the common use of kernel density estimation is severely biased. The method is applied to study the dynamic of China's urban income distribution using published interval aggregated data of 1985–2010.
Original languageEnglish (US)
Pages (from-to)3093-3105
Number of pages13
JournalJournal of Statistical Computation and Simulation
Issue number15
StatePublished - Feb 18 2016
Externally publishedYes


Dive into the research topics of 'Estimation of a probability density function using interval aggregated data'. Together they form a unique fingerprint.

Cite this