机器学习
This paper introduces Discrete Markov Probabilistic Models (DMPMs), a novel discrete diffusion algorithm for discrete data generation. The algorithm operates in discrete bit space, where the noising process is a continuous-time Markov chain…
We propose ratio divergence (RD) learning for discrete energy-based models, a method that utilizes both training data and a tractable target energy function. We apply RD learning to restricted Boltzmann machines (RBMs), which are a minimal…
The neural tangent kernel is a kernel function defined over the parameter distribution of an infinite width neural network. Despite the impracticality of this limit, the neural tangent kernel has allowed for a more direct study of neural…
Temporal difference (TD) learning is a cornerstone of reinforcement learning. In the average-reward setting, standard TD($\lambda$) is highly sensitive to the choice of step-size and thus requires careful tuning to maintain numerical…
Methods for solving scientific computing and inference problems, such as kernel- and neural network-based approaches for partial differential equations (PDEs), inverse problems, and supervised learning tasks, depend crucially on the choice…
Large language models have achieved impressive performance across diverse tasks. However, their tendency to produce overconfident and factually incorrect outputs, known as hallucinations, poses risks in real world applications. Conformal…
Low rank inference on matrices is widely conducted by optimizing a cost function augmented with a penalty proportional to the nuclear norm $\Vert \cdot \Vert_*$. However, despite the assortment of computational methods for such problems,…
In graphical models, factor graphs, and more generally energy-based models, the interactions between variables are encoded by a graph, a hypergraph, or, in the most general case, a partially ordered set (poset). Inference on such…
We address the problem of planning under uncertainty, where an agent must choose actions that not only achieve desired outcomes but also reduce uncertainty. Traditional methods often treat exploration and exploitation as separate…
Membership inference attacks (MIA) can reveal whether a particular data point was part of the training dataset, potentially exposing sensitive information about individuals. This article provides theoretical guarantees by exploring the…
Conformal prediction (CP) was developed to provide finite-sample probabilistic prediction guarantees. While CP algorithms are a relatively general-purpose approach to uncertainty quantification, with finite-sample guarantees, they lack…
We present a categorical framework for relating causal models that represent the same system at different levels of abstraction. We define a causal abstraction as natural transformations between appropriate Markov functors, which concisely…
Set-valued classification is used in multiclass settings where confusion between classes can occur and lead to misleading predictions. However, its application may amplify discriminatory bias motivating the development of set-valued…
Understanding signal behavior across scales is vital in areas such as natural phenomena analysis and financial modeling. A key property is self-similarity, quantified by the Hurst exponent (H), which reveals long-term dependencies.…
In this work, we investigate high-dimensional kernel ridge regression (KRR) on i.i.d. Gaussian data with anisotropic power-law covariance. This setting differs fundamentally from the classical source & capacity conditions for KRR, where…
A generic D-dimensional Gaussian can be conditioned or projected onto the D-1 unit sphere, thereby leading to the well-known Fisher-Bingham (FB) or Angular Gaussian (AG) distribution families, respectively. These are some of the most…
We introduce the Divergence Phase Index (DPI), a novel framework for quantifying phase differences in one and multidimensional signals, grounded in harmonic analysis via the Riesz transform. Based on classical Hilbert Transform phase…
Robust Principal Component Analysis (RPCA) aims to recover a low-rank structure from noisy, partially observed data that is also corrupted by sparse, potentially large-magnitude outliers. Traditional RPCA models rely on convex relaxations,…
Machine Learning is becoming increasingly more important in today's world. It is therefore very important to provide understanding of the decision-making process of machine-learning models. A popular way to do this is by looking at the…
Bayesian Optimization (BO) is widely used for optimizing expensive black-box functions, particularly in hyperparameter tuning. However, standard BO assumes access to precise objective values, which may be unavailable, noisy, or unreliable…