机器学习
We introduce a supervised dimensionality reduction methodology for categorical (and discretized mixed-type) data based on a density-matrix construction induced by class-conditional frequencies. Given a labeled dataset encoded in a one-hot…
Modern machine learning models can be accurate on average yet still make mistakes that dominate deployment cost. We introduce Locus, a distribution-free wrapper that produces a per-input loss-scale reliability score for a fixed prediction…
Conformal prediction (CP) provides finite-sample, distribution-free marginal coverage, but standard conformal regression intervals can be inefficient under heteroscedasticity and skewness. In particular, popular constructions such as…
We study experiments on interacting populations of humans and AI agents, where both unit types and the interaction network remain unobserved. Although causal effects propagate throughout the system, the goal is to estimate effects on…
We study adaptive estimation and inference in ill-posed linear inverse problems defined by conditional moment restrictions. Existing regularized estimators such as Regularized DeepIV (RDIV) require prior knowledge of the smoothness of the…
In this work, we investigate the generalization properties of random feature methods. Our analysis extends prior results for Tikhonov regularization to a broad class of spectral regularization techniques and further generalizes the setting…
The Nash-Sutcliffe efficiency ($\text{NSE}$) is a widely used, positively oriented relative measure for evaluating forecasts across multiple time series. However, it lacks a decision-theoretic foundation for this purpose. To address this,…
Latent-space Bayesian optimization (LSBO) extends Bayesian optimization to structured domains, such as molecular design, by searching in the continuous latent space of a generative model. However, most LSBO methods assume a fixed objective,…
We propose a Multivariate Spatio-Temporal Neural Hawkes Process for modeling complex multivariate event data with spatio-temporal dynamics. The proposed model extends continuous-time neural Hawkes processes by integrating spatial…
Smooth activation functions are ubiquitous in modern deep learning, yet their theoretical advantages over non-smooth counterparts remain poorly understood. In this work, we study both approximation and statistical properties of neural…
We develop a finite-sample, design-based theory for random forests in which each tree is a randomized conditional predictor acting on fixed covariates and the forest is their Monte Carlo average. An exact variance identity separates Monte…
The Kullback-Leibler (KL) divergence is not a proper distance metric and does not satisfy the triangle inequality, posing theoretical challenges in certain practical applications. Existing work has demonstrated that KL divergence between…
Time-series forecasting increasingly demands not only accurate observational predictions but also causal forecasting under interventional and counterfactual queries in multivariate systems. We present DoFlow, a flow-based generative model…
We identify and analyze a surprising phenomenon of Latent Diffusion Models (LDMs) where the final steps of the diffusion can degrade sample quality. In contrast to conventional arguments that justify early stopping for numerical stability,…
Fourier analysis on the Boolean hypercube is fundamentally defined as the orthogonal decomposition of the space of pseudo-Boolean functions with respect to the uniform probability measure. In this work, we propose an ANOVA-based…
When training large-scale models, the performance typically scales with the number of parameters and the dataset size according to a slow power law. A fundamental theoretical and practical question is whether comparable performance can be…
The global dimensionality of a neural representation manifold provides rich insight into the computational process underlying both artificial and biological neural networks. However, all existing measures of global dimensionality are…
We introduce Galerkin-ARIMA and Galerkin-SARIMA, a projection-based extension of classical ARIMA/SARIMA that replaces rigid linear lag operators with low-dimensional Galerkin basis expansions while preserving the familiar AR-MA…
The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially…
We investigate the problem of weight uncertainty originally proposed by [Blundell et al. (2015). Weight uncertainty in neural networks. In International conference on machine learning, 1613-1622, PMLR.] in the context of neural networks…