Related papers: Gradient Computation In Linear-Chain Conditional R…

Entropy linear programming

We propose an efficient dual algorithm for ELP based on Fast Gradient Method. The basic idea - to solve properly regularized dual problem.

Optimization and Control · Mathematics 2016-02-05 Alexander Gasnikov , Evgenia Gasnikova , Yurii Nesterov , Alexey Chernov

Accelerated Message Passing for Entropy-Regularized MAP Inference

Maximum a posteriori (MAP) inference in discrete-valued Markov random fields is a fundamental problem in machine learning that involves identifying the most likely configuration of random variables given a distribution. Due to the…

Machine Learning · Computer Science 2020-07-03 Jonathan N. Lee , Aldo Pacchiano , Peter Bartlett , Michael I. Jordan

Linear-Time Demonstration Selection for In-Context Learning via Gradient Estimation

This paper introduces an algorithm to select demonstration examples for in-context learning of a query set. Given a set of $n$ examples, how can we quickly select $k$ out of $n$ to best serve as the conditioning for downstream inference?…

Machine Learning · Computer Science 2025-11-05 Ziniu Zhang , Zhenshuo Zhang , Dongyue Li , Lu Wang , Jennifer Dy , Hongyang R. Zhang

Stochastic Gradient Trees

We present an algorithm for learning decision trees using stochastic gradient information as the source of supervision. In contrast to previous approaches to gradient-based tree learning, our method operates in the incremental learning…

Machine Learning · Statistics 2019-09-25 Henry Gouk , Bernhard Pfahringer , Eibe Frank

Leveraging Gradients for Unsupervised Accuracy Estimation under Distribution Shift

Estimating the test performance of a model, possibly under distribution shift, without having access to the ground-truth labels is a challenging, yet very important problem for the safe deployment of machine learning algorithms in the wild.…

Machine Learning · Computer Science 2025-05-13 Renchunzi Xie , Ambroise Odonnat , Vasilii Feofanov , Ievgen Redko , Jianfeng Zhang , Bo An

Gradient Estimation Using Stochastic Computation Graphs

In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external…

Machine Learning · Computer Science 2016-01-06 John Schulman , Nicolas Heess , Theophane Weber , Pieter Abbeel

Computing High-Degree Polynomial Gradients in Memory

Specialized function gradient computing hardware could greatly improve the performance of state-of-the-art optimization algorithms, e.g., based on gradient descent or conjugate gradient methods that are at the core of control, machine…

Emerging Technologies · Computer Science 2024-01-30 T. Bhattacharya , G. H. Hutchinson , G. Pedretti , X. Sheng , J. Ignowski , T. Van Vaerenbergh , R. Beausoleil , J. P. Strachan , D. B. Strukov

Variance-based Gradient Compression for Efficient Distributed Deep Learning

Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…

Machine Learning · Computer Science 2018-02-21 Yusuke Tsuzuku , Hiroto Imachi , Takuya Akiba

Entropy Sentinel: Continuous LLM Accuracy Monitoring from Decoding Entropy Traces in STEM

Deploying LLMs raises two coupled challenges: (1) monitoring--estimating where a model underperforms as traffic and domains drift--and (2) improvement--prioritizing data acquisition to close the largest performance gaps. We test whether an…

Computation and Language · Computer Science 2026-05-27 Pedro Memoli Buffa , Luciano Del Corro

Temporal Predictive Coding for Gradient Compression in Distributed Learning

This paper proposes a prediction-based gradient compression method for distributed learning with event-triggered communication. Our goal is to reduce the amount of information transmitted from the distributed agents to the parameter server…

Information Theory · Computer Science 2024-10-04 Adrian Edin , Zheng Chen , Michel Kieffer , Mikael Johansson

Explain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain Outcomes

In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions. Unfortunately, a straightforward application of gradient based training requires intermediate computations to be stored…

Machine Learning · Computer Science 2023-08-14 Artyom Sorokin , Nazar Buzun , Leonid Pugachev , Mikhail Burtsev

Entropic Herding

Herding is a deterministic algorithm used to generate data points that can be regarded as random samples satisfying input moment conditions. The algorithm is based on the complex behavior of a high-dimensional dynamical system and is…

Machine Learning · Statistics 2023-05-10 Hiroshi Yamashita , Hideyuki Suzuki , Kazuyuki Aihara

Entropy Message Passing

The paper proposes a new message passing algorithm for cycle-free factor graphs. The proposed "entropy message passing" (EMP) algorithm may be viewed as sum-product message passing over the entropy semiring, which has previously appeared in…

Machine Learning · Computer Science 2016-11-18 Velimir M. Ilic , Miomir S. Stankovic , Branimir T. Todorovic

Gradient-matching coresets for continual learning

We devise a coreset selection method based on the idea of gradient matching: The gradients induced by the coreset should match, as closely as possible, those induced by the original training dataset. We evaluate the method in the context of…

Machine Learning · Computer Science 2021-12-10 Lukas Balles , Giovanni Zappella , Cédric Archambeau

Random Feedback Alignment Algorithms to train Neural Networks: Why do they Align?

Feedback alignment algorithms are an alternative to backpropagation to train neural networks, whereby some of the partial derivatives that are required to compute the gradient are replaced by random terms. This essentially transforms the…

Machine Learning · Computer Science 2023-06-06 Dominique Chu , Florian Bacho

Cross-Entropy Optimization for Hyperparameter Optimization in Stochastic Gradient-based Approaches to Train Deep Neural Networks

In this paper, we present a cross-entropy optimization method for hyperparameter optimization in stochastic gradient-based approaches to train deep neural networks. The value of a hyperparameter of a learning algorithm often has great…

Machine Learning · Computer Science 2024-09-17 Kevin Li , Fulu Li

Smart Gradient -- An Adaptive Technique for Improving Gradient Estimation

Computing the gradient of a function provides fundamental information about its behavior. This information is essential for several applications and algorithms across various fields. One common application that require gradients are…

Numerical Analysis · Mathematics 2022-06-09 Esmail Abdul Fattah , Janet Van Niekerk , Haavard Rue

Gradients as Features for Deep Representation Learning

We address the challenging problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks. Specifically, we propose to explore gradient-based features. These features are gradients of the…

Machine Learning · Computer Science 2020-04-14 Fangzhou Mu , Yingyu Liang , Yin Li

Non-linear Gradient Algorithm for Parameter Estimation: Extended version

Gradient algorithms are classical in adaptive control and parameter estimation. For instantaneous quadratic cost functions they lead to a linear time-varying dynamic system that converges exponentially under persistence of excitation…

Optimization and Control · Mathematics 2020-10-06 Juan G. Rueda-Escobedo , Jaime A. Moreno

Towards Differentiable Multilevel Optimization: A Gradient-Based Approach

Multilevel optimization has gained renewed interest in machine learning due to its promise in applications such as hyperparameter tuning and continual learning. However, existing methods struggle with the inherent difficulty of efficiently…

Machine Learning · Computer Science 2024-10-16 Yuntian Gu , Xuzheng Chen