机器学习
We propose a method for evaluating the robustness of widely used LLM ranking systems -- variants of a Bradley--Terry model -- to dropping a worst-case very small fraction of preference data. Our approach is computationally fast and easy to…
This paper provides a formulation of the log-homotopy particle flow from the perspective of variational inference. We show that the transient density used to derive the particle flow follows a time-scaled trajectory of the Fisher-Rao…
Many learning algorithms can be represented as Markov processes, and understanding their generalization error is a central topic in learning theory. For specific continuous-time noisy algorithms, a prominent analysis technique relies on…
We introduce Latent Space Distribution Matching (LSDM), a novel framework for semi-supervised generative modeling of conditional distributions. LSDM operates in two stages: (i) learning a low-dimensional latent space from both paired and…
Density aggregation is a central problem in machine learning, for instance when combining predictions from a Deep Ensemble. The choice of aggregation remains an open question with two commonly proposed approaches being linear pooling…
Sparse autoencoders (SAEs) are widely used to extract human-interpretable features from neural network activations, but their learned features can vary substantially across random seeds and training choices. To improve stability, we studied…
Deep Neural Networks are powerful tools for solving machine learning problems, but their training often involves dense and costly parameter updates. In this work, we use a novel Max-Plus neural architecture in which classical addition and…
We consider stochastic non-stationary linear bandits where the linear parameter connecting contexts to the reward changes over time. Existing algorithms in this setting localize the policy by gradually discarding or down-weighting past…
Randomized Controlled Trials (RCTs) represent the gold standard for causal inference yet remain a scarce resource. While large-scale observational data is often available, it is utilized only for retrospective fusion, and remains discarded…
Low-dimensional structure in real-world data plays an important role in the success of generative models, which motivates diffusion models defined on intrinsic data manifolds. Such models are driven by stochastic differential equations…
The No Free Lunch (NFL) theorem guarantees equal average performance only under uniform sampling of a function space closed under permutation (c.u.p.). We ask when this averaging ceases to reflect what benchmarking actually reports. We…
Observational causal discovery is only identifiable up to the Markov equivalence class. While interventions can reduce this ambiguity, in practice interventions are often soft with multiple unknown targets. In many realistic scenarios, only…
The forward and reverse Kullback-Leibler (KL) divergences arise as limiting objectives in learning and inference yet induce markedly different inductive biases that cannot be explained at the level of expectations alone. In this work, we…
This paper proposes a novel parameter selection strategy for kernel-based gradient descent (KGD) algorithms, integrating bias-variance analysis with the splitting method. We introduce the concept of empirical effective dimension to quantify…
Clustering is a fundamental approach to understanding data patterns, wherein the intuitive Euclidean distance space is commonly adopted. However, this is not the case for implicit cluster distributions reflected by qualitative attribute…
In 2018, McInnes et al. introduced a dimensionality reduction algorithm called UMAP, which enjoys wide popularity among data scientists. Their work introduces a finite variant of a functor called the metric realization, based on an…
Wasserstein gradient flow provides a general framework for minimizing an energy functional $J$ over the space of probability measures on a Riemannian manifold $(M,g)$. Its canonical time-discretization, the Jordan-Kinderlehrer-Otto (JKO)…
We study best-of-$N$ for large language models (LLMs) where the selection is based on majority voting. In particular, we analyze the limit $N \to \infty$, which we denote as \boinflower. While this approach achieves impressive performance…
We study the Finite-Dimensional Distributions (FDDs) of deep neural networks with randomly initialized weights that have finite-order moments. Specifically, we establish Gaussian approximation bounds in the Wasserstein-$1$ norm between the…
In-Context Learning (ICL) allows Large Language Models (LLMs) to adapt to new tasks with just a few examples, but their predictions often suffer from systematic biases, leading to unstable performance in classification. While calibration…