机器学习
Who should we prioritize for treatment when causal effects cannot be estimated? In practice, organizations often rely on predictive proxies: ads are targeted using purchase probabilities, and retention incentives are allocated using…
This paper introduces torchsom, an open-source Python library that provides a reference implementation of the Self-Organizing Map (SOM) in PyTorch. This package offers three main features: (i) dimensionality reduction, (ii) clustering, and…
Sequential data - ranging from financial time series to natural language - has driven the growing adoption of autoregressive models. However, these algorithms rely on the presence of underlying patterns in the data, and their identification…
Missing data in online reinforcement learning (RL) poses challenges compared to missing data in standard tabular data or in offline policy learning. The need to impute and act at each time step means that imputation cannot be put off until…
Quantized neural network training optimizes a discrete, non-differentiable objective. The straight-through estimator (STE) enables backpropagation through surrogate gradients and is widely used. While previous studies have primarily focused…
Generative models form the backbone of modern machine learning, underpinning state-of-the-art systems in text, vision, and multimodal applications. While Maximum Likelihood Estimation has traditionally served as the dominant training…
The scientific process is a means to turn the results of experiments into knowledge about the world in which we live. Much research effort has been directed toward automating this process. To do this, one needs to formulate the scientific…
We introduce the first one-stage Top-$k$ Learning-to-Defer framework, which unifies prediction and deferral by learning a shared score-based model that selects the $k$ most cost-effective entities-labels or experts-per input. While existing…
The gradient descent (GD) has been one of the most common optimizer in machine learning. In particular, the loss landscape of a neural network is typically sharpened during the initial phase of training, making the training dynamics hover…
In recent years, information-theoretic generalization bounds have gained increasing attention for analyzing the generalization capabilities of meta-learning algorithms. However, existing results are confined to two-step bounds, failing to…
This paper studies the problem of learning Bayesian networks from continuous observational data, generated according to a linear Gaussian structural equation model. We consider an $\ell_0$-penalized maximum likelihood estimator for this…
We introduce an ordinary differential equation (ODE) based deep generative method for learning conditional distributions, named Conditional F\"ollmer Flow. Starting from a standard Gaussian distribution, the proposed flow could approximate…
Many leading classification algorithms output a classifier that is a weighted average of kernel evaluations. Optimizing these weights is a nontrivial problem that still attracts much research effort. Furthermore, explaining these methods to…
Real-world clinical problems are often characterized by multimodal data, usually associated with incomplete views and limited sample sizes in their cohorts, posing significant limitations for machine learning algorithms. In this work, we…
The vulnerability of machine learning models to adversarial attacks remains a critical security challenge. Traditional defenses, such as adversarial training, typically robustify models by minimizing a worst-case loss. However, these…
The universal approximation property uniformly with respect to weakly compact families of measures is established for several classes of neural networks. To that end, we prove that these neural networks are dense in Orlicz spaces, thereby…
We study generative modeling on convex domains using flow matching and mirror maps, and identify two fundamental challenges. First, standard log-barrier mirror maps induce heavy-tailed dual distributions, leading to ill-posed dynamics.…
Smart training set selections procedures enable the reduction of data needs and improves predictive robustness in machine learning problems relevant to chemistry. We introduce Gradient Guided Furthest Point Sampling (GGFPS), a simple…
Bayesian optimization (BO) is an effective technique for black-box optimization. However, its applicability is typically limited to moderate-budget problems due to the cubic complexity of fitting the Gaussian process (GP) surrogate model.…
Most of the recent results in polynomial functional regression have been focused on an in-depth exploration of single-parameter regularization schemes. In contrast, in this study we go beyond that framework by introducing an algorithm for…