机器学习
Physics-informed statistical learning (PISL) integrates empirical data with physical knowledge to enhance the statistical performance of estimators. While PISL methods are widely used in practice, a comprehensive theoretical understanding…
We introduce Coupled Flow Matching (CPFM), a framework that integrates controllable dimensionality reduction and high-fidelity reconstruction. CPFM learns coupled continuous flows for both the high-dimensional data x and the low-dimensional…
Online learning algorithms continually update their models as data arrive, making it essential to accurately estimate the expected loss at the current time step. The prequential method is an effective estimation approach which can be…
In this paper, we consider a block coordinate descent (BCD) algorithm for training deep neural networks and provide a new global convergence guarantee under strictly monotonically increasing activation functions. While existing works…
Semi-supervised learning (SSL) aims to train a machine learning model using both labelled and unlabelled data. While the unlabelled data have been used in various ways to improve the prediction accuracy, the reason why unlabelled data could…
The Schr\"odinger bridge problem seeks the optimal stochastic process that connects two given probability distributions with minimal energy modification. While the Sinkhorn algorithm is widely used to solve the static optimal transport…
Uncovering the underlying causal mechanisms of complex real-world systems remains a significant challenge, as these systems often entail high data collection costs and involve unknown interventions. We introduce MetaCaDI, the first…
Sparse variable selection improves interpretability and generalization in high-dimensional learning by selecting a small subset of informative features. Recent advances in Mixed Integer Programming (MIP) have enabled solving large-scale…
This paper studies prediction with multiple candidate models, where the goal is to combine their outputs. This task is especially challenging in heterogeneous settings, where different models may be better suited to different inputs. We…
Causal inference identifies cause-and-effect relationships between variables. While traditional approaches rely on data to reveal causal links, a recently developed method, assimilative causal inference (ACI), integrates observations with…
Probabilistic models must be well calibrated to support reliable decision-making. While calibration in single-output regression is well studied, defining and achieving multivariate calibration in multi-output regression remains considerably…
In many scientific applications, uncertainty of estimates from an earlier (upstream) analysis needs to be propagated in subsequent (downstream) Bayesian analysis, without feedback. Cutting feedback methods, also termed cut-Bayes, achieve…
Variational Learning (VL) has recently gained popularity for training deep neural networks. Part of its empirical success can be explained by theories such as PAC-Bayes bounds, minimum description length and marginal likelihood, but little…
Integral equations are widely used in fields such as applied modeling, medical imaging, and system identification, providing a powerful framework for solving deterministic problems. While parameter identification for differential equations…
We present the Deep Copula Classifier (DCC), a class-conditional generative model that separates marginal estimation from dependence modeling using neural copula densities. DCC is interpretable, Bayes-consistent, and achieves excess-risk…
Performativity of predictions refers to the phenomenon where prediction-informed decisions influence the very targets they aim to predict -- a dynamic commonly observed in policy-making, social sciences, and economics. In this paper, we…
The quality of foundation models depends heavily on their training data. Consequently, great efforts have been put into dataset curation. Yet most approaches rely on manual tuning of coarse-grained mixtures of large buckets of data, or…
We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set, minimizing a function associated with the context. The goal is to minimize all…
We introduce a highly expressive yet distinctly tractable family for black-box variational inference (BBVI). Each member of this family is a weighted product of experts (PoE), and each weighted expert in the product is proportional to a…
Flow and diffusion-based models have emerged as powerful tools for scientific applications, particularly for sampling non-normalized probability distributions, as exemplified by Boltzmann Generators (BGs). A critical challenge in deploying…