机器学习
The Mahalanobis distance is a classical tool used to measure the covariance-adjusted distance between points in $\bbR^d$. In this work, we extend the concept of Mahalanobis distance to separable Banach spaces by reinterpreting it as a…
The Rasch model, a classical model in the item response theory, is widely used in psychometrics to model the relationship between individuals' latent traits and their binary responses to assessments or questionnaires. In this paper, we…
Machine learning approaches for image classification have led to impressive advances in that field. For example, convolutional neural networks are able to achieve remarkable image classification accuracy across a wide range of applications…
Tree ensembles have demonstrated state-of-the-art predictive performance across a wide range of problems involving tabular data. Nevertheless, the black-box nature of tree ensembles is a strong limitation, especially for applications with…
Uncertain knowledge graph embedding (UnKGE) methods learn vector representations that capture both structural and uncertainty information to predict scores of unseen triples. However, existing methods produce only point estimates, without…
We introduce the \emph{Symplectic Generative Network (SGN)}, a deep generative model that leverages Hamiltonian mechanics to construct an invertible, volume-preserving mapping between a latent space and the data space. By endowing the…
Real-world data distributions often shift continuously across multiple latent factors such as time, geography, and socioeconomic contexts. However, existing domain generalization approaches typically treat domains as discrete or as evolving…
Network pruning is used to reduce inference latency and power consumption in large neural networks. However, most current pruning methods rely on ad-hoc heuristics that are poorly understood. We introduce Hyperflux, a conceptually-grounded…
Understanding the inductive bias and generalization properties of large overparametrized machine learning models requires to characterize the dynamics of the training algorithm. We study the learning dynamics of large two-layer neural…
There are several applications of stochastic optimization where one can benefit from a robust estimate of the gradient. For example, domains such as distributed learning with corrupted nodes, the presence of large outliers in the training…
In recent years, transfer learning has garnered significant attention. Its ability to leverage knowledge from related studies to improve generalization performance in a target study has made it highly appealing. This paper focuses on…
Offline reinforcement learning is important in domains such as medicine, economics, and e-commerce where online experimentation is costly, dangerous or unethical, and where the true model is unknown. However, most methods assume all…
Score-based generative models based on stochastic differential equations (SDEs) achieve impressive performance in sampling from unknown distributions, but often fail to satisfy underlying constraints. We propose a constrained generative…
Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks. Where a Bayesian treatment is usually associated with high-quality predictions and uncertainties, the practical reality has been…
A/B testing has become the cornerstone of decision-making in online markets, guiding how platforms launch new features, optimize pricing strategies, and improve user experience. In practice, we typically employ the pairwise $t$-test to…
Estimating the distribution of outcomes under counterfactual policies is critical for decision-making in domains such as recommendation, advertising, and healthcare. We propose and analyze a novel framework-Counterfactual Policy Mean…
Modern machine learning models are often trained in a setting where the number of parameters exceeds the number of training samples. To understand the implicit bias of gradient descent in such overparameterized models, prior work has…
Transformers have emerged as a powerful neural network architecture capable of tackling a wide range of learning tasks. In this work, we provide a theoretical analysis of their ability to automatically extract structure from data in an…
In this paper, we leverage stochastic projection and lossy compression to establish new conditional mutual information (CMI) bounds on the generalization error of statistical learning algorithms. It is shown that these bounds are generally…
Calibration has emerged as a foundational goal in ``trustworthy machine learning'', in part because of its strong decision theoretic semantics. Independent of the underlying distribution, and independent of the decision maker's utility…