Related papers: Gradient Computation In Linear-Chain Conditional R…
We propose an efficient dual algorithm for ELP based on Fast Gradient Method. The basic idea - to solve properly regularized dual problem.
Maximum a posteriori (MAP) inference in discrete-valued Markov random fields is a fundamental problem in machine learning that involves identifying the most likely configuration of random variables given a distribution. Due to the…
This paper introduces an algorithm to select demonstration examples for in-context learning of a query set. Given a set of $n$ examples, how can we quickly select $k$ out of $n$ to best serve as the conditioning for downstream inference?…
We present an algorithm for learning decision trees using stochastic gradient information as the source of supervision. In contrast to previous approaches to gradient-based tree learning, our method operates in the incremental learning…
Estimating the test performance of a model, possibly under distribution shift, without having access to the ground-truth labels is a challenging, yet very important problem for the safe deployment of machine learning algorithms in the wild.…
In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external…
Specialized function gradient computing hardware could greatly improve the performance of state-of-the-art optimization algorithms, e.g., based on gradient descent or conjugate gradient methods that are at the core of control, machine…
Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…
Deploying LLMs raises two coupled challenges: (1) monitoring--estimating where a model underperforms as traffic and domains drift--and (2) improvement--prioritizing data acquisition to close the largest performance gaps. We test whether an…
This paper proposes a prediction-based gradient compression method for distributed learning with event-triggered communication. Our goal is to reduce the amount of information transmitted from the distributed agents to the parameter server…
In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions. Unfortunately, a straightforward application of gradient based training requires intermediate computations to be stored…
Herding is a deterministic algorithm used to generate data points that can be regarded as random samples satisfying input moment conditions. The algorithm is based on the complex behavior of a high-dimensional dynamical system and is…
The paper proposes a new message passing algorithm for cycle-free factor graphs. The proposed "entropy message passing" (EMP) algorithm may be viewed as sum-product message passing over the entropy semiring, which has previously appeared in…
We devise a coreset selection method based on the idea of gradient matching: The gradients induced by the coreset should match, as closely as possible, those induced by the original training dataset. We evaluate the method in the context of…
Feedback alignment algorithms are an alternative to backpropagation to train neural networks, whereby some of the partial derivatives that are required to compute the gradient are replaced by random terms. This essentially transforms the…
In this paper, we present a cross-entropy optimization method for hyperparameter optimization in stochastic gradient-based approaches to train deep neural networks. The value of a hyperparameter of a learning algorithm often has great…
Computing the gradient of a function provides fundamental information about its behavior. This information is essential for several applications and algorithms across various fields. One common application that require gradients are…
We address the challenging problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks. Specifically, we propose to explore gradient-based features. These features are gradients of the…
Gradient algorithms are classical in adaptive control and parameter estimation. For instantaneous quadratic cost functions they lead to a linear time-varying dynamic system that converges exponentially under persistence of excitation…
Multilevel optimization has gained renewed interest in machine learning due to its promise in applications such as hyperparameter tuning and continual learning. However, existing methods struggle with the inherent difficulty of efficiently…