机器学习
We study a nonlinear factor model in which observed responses depend on low-rank latent factors through an unknown monotone link function. This setting is challenging and largely underexplored due to severe nonconvexity and identifiability…
We study the problem of multiclass PAC learning with bandit feedback in the realizable setting. In this framework, there is an unknown data distribution over an instance space $\mathcal{X}$ and a label space $\mathcal{Y}$, as in classical…
Lagrangian Relaxation (LR) is a powerful technique for solving large-scale Mixed Integer Linear Programming (MILP), particularly those with decomposable structures, such as vehicle routing or unit commitment problems. By relaxing the…
This paper studies approximation by shallow ReLU$^s$ networks, $\sigma_s(t)=\max\{0,t\}^s$, together with their generalization behavior under $\ell_1$ path-norm control. For the $L^p$-type integral spaces…
We study long-horizon deployment of a frozen predictor under dynamic covariate shift. A time-domain Poincare inequality first reduces temporal risk volatility to derivative energy. A Jacobian-velocity theorem then supplies the corresponding…
We consider debiased inference on finite-dimensional functionals of infinite-dimensional least-squares solutions to inverse problems as a way to avoid having to assume exact solutions exist. Such assumptions are substantive and not…
Recent work in the privacy literature shows that sample-targeted membership inference attacks (MIAs) significantly outperform untargeted approaches by a wide margin. Motivated by this observation, we address the following question: can the…
Quantizing machine learning models has demonstrated its effectiveness in lowering memory and inference costs while maintaining performance levels comparable to those of the original models. In this work, we investigate the impact of…
In data science, individual observations are often assumed to come independently from an underlying probability space. Kernel matrices formed from large sets of such observations arise frequently, for example during classification tasks. It…
Classification of high-dimensional low sample size (HDLSS) data poses a challenge in a variety of real-world situations, such as gene expression studies, cancer research, and medical imaging. This article presents the development and…
Frontier LLMs now perform strongly across a wide range of physics evaluations, but it is hard to disentangle genuine reasoning from recall of established science. We introduce DiscoverPhysics, an interactive benchmark that asks a LLM agent…
Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite…
Efficient benchmarking techniques aim to lower the computational cost of evaluating LLMs by predicting full benchmark scores using only a subset of a benchmark's questions. By reframing this problem as an instance of multiple regression…
This paper proposes StrTransformer, a source-wise structured Transformer framework for blind source recovery and branch-wise latent modeling. Instead of using an encoder to infer latent variables, StrTransformer directly optimizes the…
The ability of deep neural networks to learn hierarchical features is widely regarded as a key mechanism underlying their success in high-dimensional learning. Existing theory partially supports this view by establishing approximation rates…
We study optimal experimental design for multinomial logit (MNL) bandits, where an agent repeatedly selects a subset of $K$ items from a ground set of size $N$ and observes single-choice feedback. Unlike linear or generalized linear…
We study nonstationary generalized linear bandits (GLBs), where the expected reward is modeled through a nonlinear link function with an unknown time-varying parameter. This framework encompasses a broad class of reward models, including…
We study the geometry of determinantal point processes (DPPs) through the spectral decomposition $L=U\Lambda U^{\top}$. The spectrum $\Lambda$ governs the cardinality distribution via elementary symmetric polynomials, while the eigenspace…
Reconstructing PDE solutions from sparse observations is a core challenge in scientific computing. We present FM4PDE, a flow-matching generative framework that learns the joint distribution of PDE coefficients (or initial states) and…
Removing noise is difficult, but adding noise is easy. In this work, we show how to eliminate mean-shift noisy components from PCA by deliberately introducing knockoff mean-shift perturbation. Standard PCA is highly sensitive to shifts in…