Statistics
We study the problem of multiclass PAC learning with bandit feedback in the realizable setting. In this framework, there is an unknown data distribution over an instance space $\mathcal{X}$ and a label space $\mathcal{Y}$, as in classical…
Lagrangian Relaxation (LR) is a powerful technique for solving large-scale Mixed Integer Linear Programming (MILP), particularly those with decomposable structures, such as vehicle routing or unit commitment problems. By relaxing the…
This paper studies approximation by shallow ReLU$^s$ networks, $\sigma_s(t)=\max\{0,t\}^s$, together with their generalization behavior under $\ell_1$ path-norm control. For the $L^p$-type integral spaces…
Conformal prediction provides prediction sets with finite-sample marginal coverage, but many applications require coverage guarantees that adapt to individual test points, a subpopulation, or a structural component of the data. Existing…
We study long-horizon deployment of a frozen predictor under dynamic covariate shift. A time-domain Poincare inequality first reduces temporal risk volatility to derivative energy. A Jacobian-velocity theorem then supplies the corresponding…
We consider debiased inference on finite-dimensional functionals of infinite-dimensional least-squares solutions to inverse problems as a way to avoid having to assume exact solutions exist. Such assumptions are substantive and not…
Recent work in the privacy literature shows that sample-targeted membership inference attacks (MIAs) significantly outperform untargeted approaches by a wide margin. Motivated by this observation, we address the following question: can the…
Climate change is expected to significantly affect the physical, financial, and economic environments over the long term, posing risks to the financial health of general insurers. While general insurers typically use Dynamic Financial…
Quantizing machine learning models has demonstrated its effectiveness in lowering memory and inference costs while maintaining performance levels comparable to those of the original models. In this work, we investigate the impact of…
In data science, individual observations are often assumed to come independently from an underlying probability space. Kernel matrices formed from large sets of such observations arise frequently, for example during classification tasks. It…
Classification of high-dimensional low sample size (HDLSS) data poses a challenge in a variety of real-world situations, such as gene expression studies, cancer research, and medical imaging. This article presents the development and…
Frontier LLMs now perform strongly across a wide range of physics evaluations, but it is hard to disentangle genuine reasoning from recall of established science. We introduce DiscoverPhysics, an interactive benchmark that asks a LLM agent…
This paper proposes the quantile unit-log-symmetric autoregressive moving average (QULS--ARMA) model for bounded time series on the open unit interval $(0,1)$. The model extends the unit-log-symmetric family by introducing a quantile-based…
Molecular signatures derived from omics data are increasingly used in epidemiological studies to characterize lifestyle exposures, either as proxies of exposure or to provide insight into disease mechanisms. These signatures are typically…
Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite…
Regression modeling of recurrent and terminal events continues to present methodological challenges in survival analysis. Existing approaches either make unverifiable assumptions about the dependency structure between the two event types or…
The empirical distribution function assigns mass $1/n$ to each of the $n$ observations in a sample. As these are highly variable, estimation error may be reduced by replacing them with estimated observations that are asymptotically less…
Exponential random graph models (ERGMs) are a widely used framework for network data, enabling hypothesis testing on the structural mechanisms underlying observed networks. Bayesian ERGMs provide principled uncertainty quantification and…
We study change-point detection for high-dimensional data in regimes where inference must be performed from small batches of observations. Our primary focus is the high-dimensional, low sample size (HDLSS) regime, where the sequence length…
We study counterfactual distribution learning for high-dimensional outcomes whose counterfactual law may concentrate near lower-dimensional structure. Standard isotropic smoothing treats all ambient directions equally, leading to…