机器学习
Distributional treatment effects can be invisible to means: a treatment may preserve average outcomes while changing tails, modes, dispersion, or rare-event probabilities. Kernel tests can detect discrepancies between interventional outcome…
Vision-Language Latent Diffusion Models (LDMs) (Rombach et al., 2022) provide powerful generative priors for inverse problems. However, existing LDM-based inverse solvers typically require a large number of neural function evaluations…
Online learning from a stream of data is a defining feature of intelligence, yet modern machine learning systems often struggle in this setting, especially under distributional shift. To understand its basic properties, we study the…
The expectation--maximization (EM) algorithm combines global monotonicity, local linear convergence, and strong practical robustness, but these features are usually analyzed separately. Global descent is nonlinear, whereas local convergence…
High-dimensional count data arise in applications such as single-cell RNA sequencing and neural spike trains, where mapping between distributions across successive batches or time points form critical components of data analysis. The recent…
Persistence diagrams provide stable, interpretable summaries of geometric and topological structure and are useful for simulation-based inference when low-order statistics miss key information. Yet persistence-based pipelines require…
Estimating counterfactual distributions under interventions is central to treatment risk assessment and counterfactual generation tasks. Existing approaches model the counterfactual distribution as a standalone generative target, without…
Large Language Models often improve accuracy on reasoning tasks by sampling multiple Chain-of-Thought (CoT) traces and aggregating them with majority voting (MV), a test-time technique called self-consistency. When we truncate a CoT partway…
Understanding why trained Transformers generalize well is a fundamental problem in modern machine learning theory, and complexity-based generalization bounds provide a principled way to study this question. While existing norm-based bounds…
Classical clustering methods usually return either a finite partition of the observed data or a finite dendrogram over it. This finite-sample view is inadequate when the hierarchy of interest is a recursive geometric object with fine-scale…
Constructing valid and informative conformal prediction regions for multi-dimensional outputs remains a fundamental challenge. While conformal prediction provides finite-sample, distribution-free coverage guarantees, its practical…
We show that, in a precise sense, a broad class of feedforward neural networks learn (have finite sample complexity) in the PAC model: every fixed finite feedforward architecture whose layers are definable in an o-minimal structure has…
Individual treatment effects are not point-identified from data. The Probability of Necessity and Sufficiency (PNS) circumvents this limitation by characterizing individual-level causality through intersection bounds derived from combined…
Evaluation of large language models (LLMs) is increasingly critical, yet standard benchmarking methods rely on average accuracy, overlooking both the inherent stochasticity of LLM outputs and the heterogeneity of benchmark items. Item…
Instrumental-variable (IV) regression enables causal estimation under endogeneity, but modern IV problems often involve nonlinear structural effects and high-dimensional covariates. Existing nonlinear IV methods directly learn the causal…
Many ranking and agent trace datasets are recorded as linear orders even though their latent structure is only partially ordered. This is especially common in agent and workflow traces, where observed order may reflect arbitrary…
This paper presents a parametric solution to piecewise linear regression through the Adaptive Block Gradient Descent (ABGD) algorithm. The heart of the method is the parametrization of piecewise linear functions as the difference of…
The Maximum Mean Discrepancy (MMD) is a cornerstone statistic for nonparametric two-sample testing, but its test power is dictated entirely by the chosen kernel. Because any fixed kernel inherently fails to distinguish certain…
We study the spectral properties of sample covariance matrices constructed from pooled sequence representations, where token embeddings are drawn from a fixed two-class Gaussian mixture table and pooled via (fixed) attention weights.…
We study finite-horizon continuous-time policy evaluation from discrete closed-loop trajectories under time-inhomogeneous dynamics. The target value surface solves a backward parabolic equation, but the Bellman baseline obtained from…