Related papers: Prediction & Model Evaluation for Space-Time Data

Estimating the Prediction Performance of Spatial Models via Spatial k-Fold Cross Validation

In machine learning one often assumes the data are independent when evaluating model performance. However, this rarely holds in practise. Geographic information data sets are an example where the data points have stronger dependencies among…

Applications · Statistics 2020-06-01 Jonne Pohjankukka , Tapio Pahikkala , Paavo Nevalainen , Jukka Heikkonen

Assessing the performance of spatial cross-validation approaches for models of spatially structured data

Evaluating models fit to data with internal spatial structure requires specific cross-validation (CV) approaches, because randomly selecting assessment data may produce assessment sets that are not truly independent of data used to train…

Computation · Statistics 2023-03-14 Michael J Mahoney , Lucas K Johnson , Julia Silge , Hannah Frick , Max Kuhn , Colin M Beier

Aligning Validation with Deployment in Spatial Prediction: Target-Weighted Cross-Validation

Reliable estimation of predictive performance is essential for spatial environmental modeling, where machine-learning models are used to generate maps from unevenly distributed observations. Standard cross-validation (CV) assumes that…

Machine Learning · Computer Science 2026-05-22 Alexander Brenning , Thomas Suesse

Cross validation for model selection: a primer with examples from ecology

The growing use of model-selection principles in ecology for statistical inference is underpinned by information criteria (IC) and cross-validation (CV) techniques. Although IC techniques, such as Akaike's Information Criterion, have been…

Methodology · Statistics 2022-03-10 Luke Yates , Zach Aandahl , Shane A. Richards , Barry W. Brook

Importance of spatial predictor variable selection in machine learning applications -- Moving from data reproduction to spatial prediction

Machine learning algorithms find frequent application in spatial prediction of biotic and abiotic environmental variables. However, the characteristics of spatial data, especially spatial autocorrelation, are widely ignored. We hypothesize…

Applications · Statistics 2019-12-11 Hanna Meyer , Christoph Reudenbach , Stephan Wöllauer , Thomas Nauss

Cross-validation: what does it estimate and how well does it do it?

Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit…

Methodology · Statistics 2024-03-12 Stephen Bates , Trevor Hastie , Robert Tibshirani

Moving beyond spatial and random cross-validation in environmental modelling: a call for prediction-domain adaptive evaluation

With the growing application of spatial predictive modeling in ecology, the question of how to appropriately evaluate the resulting maps has gained increasing attention. While there is consensus that map accuracy is ideally estimated using…

Methodology · Statistics 2026-05-14 Jan Linnenbrink , Jakub Nowosad , Hanna Meyer

Cross-Validation for Correlated Data

K-fold cross-validation (CV) with squared error loss is widely used for evaluating predictive models, especially when strong distributional assumptions cannot be taken. However, CV with squared error loss is not free from distributional…

Methodology · Statistics 2021-08-10 Assaf Rabinowicz , Saharon Rosset

Model selection by cross-validation in an expectile linear regression

For linear models that may have asymmetric errors, we study variable selection by cross-validation. The data are split into training and validation sets, with the number of observations in the validation set much larger than in the training…

Methodology · Statistics 2026-01-16 Bilel Bousselmi , Gabriela Ciuperca

A Novel Framework for Spatio-Temporal Prediction of Environmental Data Using Deep Learning

As the role played by statistical and computational sciences in climate and environmental modelling and prediction becomes more important, Machine Learning researchers are becoming more aware of the relevance of their work to help tackle…

Machine Learning · Statistics 2020-12-23 Federico Amato , Fabian Guignard , Sylvain Robert , Mikhail Kanevski

The use of cross validation in the analysis of designed experiments

Cross-validation (CV) is a common method to tune machine learning methods and can be used for model selection in regression as well. Because of the structured nature of small, traditional experimental designs, the literature has warned…

Applications · Statistics 2025-06-18 Maria L. Weese , Byran J. Smucker , David J. Edwards

Statistical downscaling with spatial misalignment: Application to wildland fire PM$_{2.5}$ concentration forecasting

Fine particulate matter, PM$_{2.5}$, has been documented to have adverse health effects and wildland fires are a major contributor to PM$_{2.5}$ air pollution in the US. Forecasters use numerical models to predict PM$_{2.5}$ concentrations…

Applications · Statistics 2019-09-30 Suman Majumder , Yawen Guan , Brian J. Reich , Susan O'Neill , Ana G. Rappold

Spatial interpolation of high-frequency monitoring data

Climate modelers generally require meteorological information on regular grids, but monitoring stations are, in practice, sited irregularly. Thus, there is a need to produce public data records that interpolate available data to a high…

Applications · Statistics 2009-06-08 Michael L. Stein

When to Impute? Imputation before and during cross-validation

Cross-validation (CV) is a technique used to estimate generalization error for prediction models. For pipeline modeling algorithms (i.e. modeling procedures with multiple steps), it has been recommended the entire sequence of steps be…

Machine Learning · Statistics 2020-10-05 Byron C. Jaeger , Nicholas J. Tierney , Noah R. Simon

Reconstructing Spatiotemporal Data with C-VAEs

The continuous representation of spatiotemporal data commonly relies on using abstract data types, such as \textit{moving regions}, to represent entities whose shape and position continuously change over time. Creating this representation…

Databases · Computer Science 2023-08-29 Tiago F. R. Ribeiro , Fernando Silva , Rogério Luís de C. Costa

Foundation for unbiased cross-validation of spatio-temporal models for species distribution modeling

Evaluating the predictive performance of species distribution models (SDMs) under realistic deployment scenarios requires careful handling of spatial and temporal dependencies in the data. Cross-validation (CV) is the standard approach for…

Applications · Statistics 2025-12-22 Diana Koldasbayeva , Alexey Zaytsev

Cross Validation for Correlated Data in Regression and Classification Models, with Applications to Deep Learning

We present a methodology for model evaluation and selection where the sampling mechanism violates the i.i.d. assumption. Our methodology involves a formulation of the bias between the standard Cross-Validation (CV) estimator and the mean…

Methodology · Statistics 2025-03-14 Oren Yuval , Saharon Rosset

Approximate leave-future-out cross-validation for Bayesian time series models

One of the common goals of time series analysis is to use the observed series to inform predictions for future observations. In the absence of any actual new data to predict, cross-validation can be used to estimate a model's future…

Methodology · Statistics 2020-07-02 Paul-Christian Bürkner , Jonah Gabry , Aki Vehtari

Spatial Interpolation of Extreme Values

This paper introduces a method for spatial interpolation of extreme values, and in particular targets the case in which conventional data, resulting from a measurement for example, are available at only a few locations. To overcome this the…

Methodology · Statistics 2012-03-13 B. D. Youngman

Approximate Cross-Validation for Structured Models

Many modern data analyses benefit from explicitly modeling dependence structure in data -- such as measurements across time or space, ordered words in a sentence, or genes in a genome. A gold standard evaluation technique is structured…

Machine Learning · Statistics 2020-12-02 Soumya Ghosh , William T. Stephenson , Tin D. Nguyen , Sameer K. Deshpande , Tamara Broderick