机器学习
This paper addresses the nonparametric estimation of the drift function over a compact domain for a time-homogeneous diffusion process, based on high-frequency discrete observations from $N$ independent trajectories. We propose a neural…
Causal discovery methods can identify valid adjustment sets for causal effect estimation for a pair of target variables, even when the underlying causal graph is unknown. Global causal discovery methods focus on learning the whole causal…
Shampoo and its efficient variant, SOAP, employ structured second-moment estimations and have shown strong performance for training neural networks (NNs). In practice, however, Shampoo typically requires step-size grafting with Adam to be…
Bayesian Additive Regression Trees (BART) is a powerful statistical model that leverages the strengths of Bayesian inference and regression trees. It has received significant attention for capturing complex non-linear relationships and…
Higher-order $U$-statistics abound in fields such as statistics, machine learning, and computer science, but are known to be highly time-consuming to compute in practice. Despite their widespread appearance, a comprehensive study of their…
The identification of Linear Time-Varying (LTV) systems from input-output data is a fundamental yet challenging ill-posed inverse problem. This work introduces a unified Bayesian framework that models the system's impulse response, $h(t,…
Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships…
Gaussian Processes (GPs) provide a flexible and statistically principled foundation for modelling spatiotemporal phenomena, but their $O(N^3)$ scaling makes them intractable for large datasets. Approximate methods such as variational…
Stochastic gradient descent (SGD) or stochastic approximation has been widely used in model training and stochastic optimization. While there is a huge literature on analyzing its convergence, inference on the obtained solutions from SGD…
We introduce a novel conditional stochastic interpolant framework for generative modeling of three-dimensional shapes. The method builds on a recent LDDMM-based registration approach to learn the conditional drift between geometries. By…
We prove that the persistence diagram of the sublevel set filtration of the quadratic form f(x) = x^T M x restricted to the unit sphere S^{n-1} is analytically determined by the eigenvalues of the symmetric matrix M. By Morse theory, the…
We study finite-sample statistical performance guarantees for distributionally robust optimization (DRO) with optimal transport (OT) and OT-regularized divergence model neighborhoods. Specifically, we derive concentration inequalities for…
Quantifying predictive uncertainty is essential for real world machine learning applications, especially in scenarios requiring reliable and interpretable predictions. Many common parametric approaches rely on neural networks to estimate…
Weight decay is ubiquitous in training deep neural network architectures. Its empirical success is often attributed to capacity control; nonetheless, our theoretical understanding of its effect on the loss landscape and the set of…
AI-assisted task delegation is increasingly common, yet human effort in such systems is costly and typically unobserved. Recent work by Bastani and Cachon (2025); Sambasivan et al. (2021) shows that accuracy-based payment schemes suffer…
This study addresses the inverse problem of parameter estimation for Stochastic Differential Equations (SDEs) by minimizing a regularized discrepancy functional via Stochastic Gradient Descent (SGD). To achieve computational efficiency, we…
Reinforcement learning algorithms have been widely used for decision-making tasks in various domains. However, the performance of these algorithms can be impacted by high variance and instability, particularly in environments with noise or…
The optimal transportation problem defines a geometry of probability measures which leads to a definition for weighted averages (barycenters) of measures, finding application in the machine learning and computer vision communities as a…
Parametric models deployed in non-stationary environments degrade as the underlying data distribution evolves over time (a phenomenon known as temporal domain drift). In the current work, we present KOMET (Koopman Operator identification of…
Explainable boosting machines (EBMs) are popular "glass-box" models that learn a set of univariate functions using boosting trees. These achieve explainability through visualizations of each feature's effect. However, unlike linear model…