Related papers: A Criterion for Aggregation Error for Multivariate…
The modifiable areal unit problem and the ecological fallacy are known problems that occur when modeling multiscale spatial processes. We investigate how these forms of spatial aggregation error can guide a regionalization over a spatial…
In the latest advancements in multimodal learning, effectively addressing the spatial and semantic losses of visual data after encoding remains a critical challenge. This is because the performance of large multimodal models is positively…
Despite significant work on low-bit quantization-aware training (QAT), there is still an accuracy gap between such techniques and native training. To address this, we introduce CAGE (Curvature-Aware Gradient Estimation), a new QAT method…
We consider the matrix composite materials (CM) of either random (statistically homogeneous or inhomogeneous), periodic, or deterministic (neither random nor periodic) structures. CMs exhibit linear or nonlinear behavior, coupled or…
Earth Observation (EO) data are increasingly used in policy analysis by enabling granular estimation of conditional average treatment effects (CATE). However, a challenge in EO-based causal inference is determining the scale of the input…
We propose the Variation Calibration Error (VCE) metric for assessing the calibration of machine learning classifiers. The metric can be viewed as an extension of the well-known Expected Calibration Error (ECE) which assesses the…
Treatment non-compliance, where individuals deviate from their assigned experimental conditions, frequently complicates the estimation of causal effects. To address this, we introduce a novel learning framework based on a mixture of experts…
We introduce a goal-oriented strategy for multiscale computations performed using the Multiscale Finite Element Method (MsFEM). In a previous work, we have shown how to use, in the MsFEM framework, the concept of Constitutive Relation Error…
LLM-as-a-judge ensembles are the standard paradigm for scalable evaluation, but their aggregation mechanisms suffer from a fundamental flaw: they implicitly assume that judges provide independent estimates of true quality. However, in…
Mutual information is a measure of the dependence between random variables that has been used successfully in myriad applications in many fields. Generalized mutual information measures that go beyond classical Shannon mutual information…
Accumulated Local Effect (ALE) is a method for accurately estimating feature effects, overcoming fundamental failure modes of previously-existed methods, such as Partial Dependence Plots. However, ALE's approximation, i.e. the method for…
Uncertainty is an inherent characteristic of biological and geospatial data which is almost made by measurement error in the observed values of the quantity of interest. Ignoring measurement error can lead to biased estimates and inflated…
Knowledge graphs store a large number of factual triples while they are still incomplete, inevitably. The previous knowledge graph completion (KGC) models predict missing links between entities merely relying on fact-view data, ignoring the…
Metric validation in Grammatical Error Correction (GEC) is currently done by observing the correlation between human and metric-induced rankings. However, such correlation studies are costly, methodologically troublesome, and suffer from…
In regression models for spatial data, it is often assumed that the marginal effects of covariates on the response are constant over space. In practice, this assumption might often be questionable. In this article, we show how a Gaussian…
We study in this paper the consequences of using the Mean Absolute Percentage Error (MAPE) as a measure of quality for regression models. We show that finding the best model under the MAPE is equivalent to doing weighted Mean Absolute Error…
Accumulated Local Effects (ALE) is a widely-used explainability method for isolating the average effect of a feature on the output, because it handles cases with correlated features well. However, it has two limitations. First, it does not…
This study uses a Variational Autoencoder method to enhance the efficiency and applicability of Markov Chain Monte Carlo (McMC) methods by generating broader-spectrum prior proposals. Traditional approaches, such as the Karhunen-Lo\`eve…
High-dimensional compositional covariates, often derived from count data, are subject to measurement error and are frequently analyzed after aggregation along a prespecified tree to improve interpretability in applications such as…
Spatial misalignment arises when datasets are aggregated or collected at different spatial scales, leading to information loss. We develop a Bayesian disaggregation framework that links misaligned data to a continuous-domain model through…