机器学习
Regression on manifolds, and, more broadly, statistics on manifolds, has garnered significant importance in recent years due to the vast number of applications for non Euclidean data. Circular data is a classic example, but so is data in…
Archetypal Analysis (AA) is an unsupervised learning method that represents data as convex combinations of extreme patterns called archetypes. While AA provides interpretable and low-dimensional representations, it can inadvertently encode…
The increasing complexity of machine learning (ML) and artificial intelligence (AI) models has created a pressing need for tools that help scientists, engineers, and policymakers interpret and refine model decisions and predictions.…
We propose a framework for building patient-specific treatment recommendation models, building on the large recent literature on learning patient-level causal models and inspired by the target trial paradigm of Hernan and Robins. We focus…
Ensuring model calibration is critical for reliable prediction, yet popular distribution-free methods such as histogram binning and isotonic regression offer only asymptotic guarantees. We introduce a unified framework for Venn and…
Despite the remarkable empirical successes of Generative Adversarial Networks (GANs), the theoretical guarantees for their statistical accuracy remain rather pessimistic. In particular, the data distributions on which GANs are applied, such…
Domain adaptation (DA) is a statistical learning problem that arises when the distribution of the source data used to train a model differs from that of the target data used to evaluate the model. While many DA algorithms have demonstrated…
A methodology is developed, based on nonparametric Bayesian dictionary learning, for joint space-time wind field data extrapolation and estimation of related statistics by relying on limited/incomplete measurements. Specifically, utilizing…
Tensor Network (TN) Kernel Machines speed up model learning by representing parameters as low-rank TNs, reducing computation and memory use. However, most TN-based Kernel methods are deterministic and ignore parameter uncertainty. Further,…
It is important to identify the discriminative features for high dimensional clustering. However, due to the lack of cluster labels, the regularization methods developed for supervised feature selection can not be directly applied. To learn…
This article introduces a novel, geometric approach for multi-manifold clustering (MMC), i.e. for clustering a collection of potentially intersecting, d-dimensional manifolds into the individual manifold components. We first compute a…
The universal learning framework has been developed to obtain guarantees on the learning rates that hold for any fixed distribution, which can be much faster than the ones uniformly hold over all the distributions. Given that the Empirical…
Traditional offline reinforcement learning (RL) methods predominantly operate in a batch-constrained setting. This confines the algorithms to a specific state-action distribution present in the dataset, reducing the effects of…
Predictive models often reinforce biases which were originally embedded in their training data, through skewed decisions. In such cases, mitigation methods are critical to ensure that, regardless of the prevailing disparities, model…
In the study of complex dynamical systems, understanding and accurately modeling the underlying physical processes is crucial for predicting system behavior and designing effective interventions. Yet real-world systems exhibit pronounced…
Real-world data is often represented through the relationships between data samples, forming a graph structure. In many applications, it is necessary to learn this graph structure from the observed data. Current graph learning research has…
We introduce an algorithm for identifying interpretable subgroups with elevated treatment effects, given an estimate of individual or conditional average treatment effects (CATE). Subgroups are characterized by ``rule sets'' --…
There is no much doubt that biotic interactions shape community assembly and ultimately the spatial co-variations between species. There is a hope that the signal of these biotic interactions can be observed and retrieved by investigating…
Current state-of-the-art generative approaches frequently rely on a two-stage training procedure, where an autoencoder (often a VAE) first performs dimensionality reduction, followed by training a generative model on the learned latent…
Piecewise constant functions describe a variety of real-world phenomena in domains ranging from chemistry to manufacturing. In practice, it is often required to confidently identify the locations of the abrupt changes in these functions as…