TY - JOUR
T1 - Bayesian site selection for fast Gaussian process regression
AU - Pourhabib, Arash
AU - Liang, Faming
AU - Ding, Yu
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledged KAUST grant number(s): KUS-C1-016-04
Acknowledgements: Arash Pourhabib and Yu Ding were supported in part by NSF grants CMMI-0926803 and CMMI-1000088; Yu Ding was also supported by the NSF grant CMMI-0726939; Faming Liang's research was partially supported by NSF grants CMMI-0926803, DMS-1007457, and DMS-1106494 and an award (KUS-C1-016-04) made by King Abdullah University of Science and Technology.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.
PY - 2014/2/5
Y1 - 2014/2/5
N2 - Gaussian Process (GP) regression is a popular method in the field of machine learning and computer experiment designs; however, its ability to handle large data sets is hindered by the computational difficulty in inverting a large covariance matrix. Likelihood approximation methods were developed as a fast GP approximation, thereby reducing the computation cost of GP regression by utilizing a much smaller set of unobserved latent variables called pseudo points. This article reports a further improvement to the likelihood approximation methods by simultaneously deciding both the number and locations of the pseudo points. The proposed approach is a Bayesian site selection method where both the number and locations of the pseudo inputs are parameters in the model, and the Bayesian model is solved using a reversible jump Markov chain Monte Carlo technique. Through a number of simulated and real data sets, it is demonstrated that with appropriate priors chosen, the Bayesian site selection method can produce a good balance between computation time and prediction accuracy: it is fast enough to handle large data sets that a full GP is unable to handle, and it improves, quite often remarkably, the prediction accuracy, compared with the existing likelihood approximations. © 2014 Taylor and Francis Group, LLC.
AB - Gaussian Process (GP) regression is a popular method in the field of machine learning and computer experiment designs; however, its ability to handle large data sets is hindered by the computational difficulty in inverting a large covariance matrix. Likelihood approximation methods were developed as a fast GP approximation, thereby reducing the computation cost of GP regression by utilizing a much smaller set of unobserved latent variables called pseudo points. This article reports a further improvement to the likelihood approximation methods by simultaneously deciding both the number and locations of the pseudo points. The proposed approach is a Bayesian site selection method where both the number and locations of the pseudo inputs are parameters in the model, and the Bayesian model is solved using a reversible jump Markov chain Monte Carlo technique. Through a number of simulated and real data sets, it is demonstrated that with appropriate priors chosen, the Bayesian site selection method can produce a good balance between computation time and prediction accuracy: it is fast enough to handle large data sets that a full GP is unable to handle, and it improves, quite often remarkably, the prediction accuracy, compared with the existing likelihood approximations. © 2014 Taylor and Francis Group, LLC.
UR - http://hdl.handle.net/10754/597658
UR - http://www.tandfonline.com/doi/abs/10.1080/0740817X.2013.849833
UR - http://www.scopus.com/inward/record.url?scp=84893951100&partnerID=8YFLogxK
U2 - 10.1080/0740817X.2013.849833
DO - 10.1080/0740817X.2013.849833
M3 - Article
SN - 0740-817X
VL - 46
SP - 543
EP - 555
JO - IIE Transactions
JF - IIE Transactions
IS - 5
ER -