A combined statistical and machine learning approach for spatial prediction of extreme wildfire frequencies and sizes

Daniela Cisneros, Yan Gong, Rishikesh Yadav, Arnab Hazra*, Raphaël Huser

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Motivated by the Extreme Value Analysis 2021 (EVA 2021) data challenge, we propose a method based on statistics and machine learning for the spatial prediction of extreme wildfire frequencies and sizes. This method is tailored to handle large datasets, including missing observations. Our approach relies on a four-stage, bivariate, sparse spatial model for high-dimensional zero-inflated data that we develop using stochastic partial differential equations (SPDE), allowing sparse precision matrices for the latent processes. In Stage 1, the observations are separated in zero/nonzero categories and modeled using a two-layered hierarchical Bayesian sparse spatial model to estimate the probabilities of these two categories. In Stage 2, we first obtain empirical estimates of the spatially-varying mean and variance profiles across the spatial locations for the positive observations and smooth those estimates using fixed rank kriging. This approximate Bayesian inference method is employed to avoid the high computational burden of large spatial data modeling using spatially-varying coefficients. In Stage 3, we further model the standardized log-transformed positive observations from the second stage using a sparse bivariate spatial Gaussian process. The Gaussian distribution assumption for wildfire counts developed in the third stage is computationally effective but erroneous. Thus, in Stage 4, the predicted exceedance probabilities are post-processed using Random Forests. We draw posterior inference for Stages 1 and 3 using Markov chain Monte Carlo (MCMC) sampling. We then create a cross-validation scheme for the artificially generated gaps and compare the EVA 2021 prediction scores of the proposed model to those obtained using some competitors.

Original languageEnglish (US)
Pages (from-to)301-330
Number of pages30
Issue number2
StatePublished - Jun 2023


  • 62G32
  • 62H11
  • 62J05
  • 62J12
  • 62P12
  • Approximate Bayesian inference
  • Extreme wildfire frequencies and sizes
  • Gaussian Markov random field
  • Random Forests
  • Stochastic partial differential equation

ASJC Scopus subject areas

  • Statistics and Probability
  • Engineering (miscellaneous)
  • Economics, Econometrics and Finance (miscellaneous)


Dive into the research topics of 'A combined statistical and machine learning approach for spatial prediction of extreme wildfire frequencies and sizes'. Together they form a unique fingerprint.

Cite this