English
Related papers

Related papers: Gradient Computation In Linear-Chain Conditional R…

200 papers

We propose an efficient dual algorithm for ELP based on Fast Gradient Method. The basic idea - to solve properly regularized dual problem.

Optimization and Control · Mathematics 2016-02-05 Alexander Gasnikov , Evgenia Gasnikova , Yurii Nesterov , Alexey Chernov

Maximum a posteriori (MAP) inference in discrete-valued Markov random fields is a fundamental problem in machine learning that involves identifying the most likely configuration of random variables given a distribution. Due to the…

Machine Learning · Computer Science 2020-07-03 Jonathan N. Lee , Aldo Pacchiano , Peter Bartlett , Michael I. Jordan

This paper introduces an algorithm to select demonstration examples for in-context learning of a query set. Given a set of $n$ examples, how can we quickly select $k$ out of $n$ to best serve as the conditioning for downstream inference?…

Machine Learning · Computer Science 2025-11-05 Ziniu Zhang , Zhenshuo Zhang , Dongyue Li , Lu Wang , Jennifer Dy , Hongyang R. Zhang

We present an algorithm for learning decision trees using stochastic gradient information as the source of supervision. In contrast to previous approaches to gradient-based tree learning, our method operates in the incremental learning…

Machine Learning · Statistics 2019-09-25 Henry Gouk , Bernhard Pfahringer , Eibe Frank

Estimating the test performance of a model, possibly under distribution shift, without having access to the ground-truth labels is a challenging, yet very important problem for the safe deployment of machine learning algorithms in the wild.…

Machine Learning · Computer Science 2025-05-13 Renchunzi Xie , Ambroise Odonnat , Vasilii Feofanov , Ievgen Redko , Jianfeng Zhang , Bo An

In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external…

Machine Learning · Computer Science 2016-01-06 John Schulman , Nicolas Heess , Theophane Weber , Pieter Abbeel

Specialized function gradient computing hardware could greatly improve the performance of state-of-the-art optimization algorithms, e.g., based on gradient descent or conjugate gradient methods that are at the core of control, machine…

Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…

Machine Learning · Computer Science 2018-02-21 Yusuke Tsuzuku , Hiroto Imachi , Takuya Akiba

Deploying LLMs raises two coupled challenges: (1) monitoring--estimating where a model underperforms as traffic and domains drift--and (2) improvement--prioritizing data acquisition to close the largest performance gaps. We test whether an…

Computation and Language · Computer Science 2026-05-27 Pedro Memoli Buffa , Luciano Del Corro

This paper proposes a prediction-based gradient compression method for distributed learning with event-triggered communication. Our goal is to reduce the amount of information transmitted from the distributed agents to the parameter server…

Information Theory · Computer Science 2024-10-04 Adrian Edin , Zheng Chen , Michel Kieffer , Mikael Johansson

In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions. Unfortunately, a straightforward application of gradient based training requires intermediate computations to be stored…

Machine Learning · Computer Science 2023-08-14 Artyom Sorokin , Nazar Buzun , Leonid Pugachev , Mikhail Burtsev

Herding is a deterministic algorithm used to generate data points that can be regarded as random samples satisfying input moment conditions. The algorithm is based on the complex behavior of a high-dimensional dynamical system and is…

Machine Learning · Statistics 2023-05-10 Hiroshi Yamashita , Hideyuki Suzuki , Kazuyuki Aihara

The paper proposes a new message passing algorithm for cycle-free factor graphs. The proposed "entropy message passing" (EMP) algorithm may be viewed as sum-product message passing over the entropy semiring, which has previously appeared in…

Machine Learning · Computer Science 2016-11-18 Velimir M. Ilic , Miomir S. Stankovic , Branimir T. Todorovic

We devise a coreset selection method based on the idea of gradient matching: The gradients induced by the coreset should match, as closely as possible, those induced by the original training dataset. We evaluate the method in the context of…

Machine Learning · Computer Science 2021-12-10 Lukas Balles , Giovanni Zappella , Cédric Archambeau

Feedback alignment algorithms are an alternative to backpropagation to train neural networks, whereby some of the partial derivatives that are required to compute the gradient are replaced by random terms. This essentially transforms the…

Machine Learning · Computer Science 2023-06-06 Dominique Chu , Florian Bacho

In this paper, we present a cross-entropy optimization method for hyperparameter optimization in stochastic gradient-based approaches to train deep neural networks. The value of a hyperparameter of a learning algorithm often has great…

Machine Learning · Computer Science 2024-09-17 Kevin Li , Fulu Li

Computing the gradient of a function provides fundamental information about its behavior. This information is essential for several applications and algorithms across various fields. One common application that require gradients are…

Numerical Analysis · Mathematics 2022-06-09 Esmail Abdul Fattah , Janet Van Niekerk , Haavard Rue

We address the challenging problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks. Specifically, we propose to explore gradient-based features. These features are gradients of the…

Machine Learning · Computer Science 2020-04-14 Fangzhou Mu , Yingyu Liang , Yin Li

Gradient algorithms are classical in adaptive control and parameter estimation. For instantaneous quadratic cost functions they lead to a linear time-varying dynamic system that converges exponentially under persistence of excitation…

Optimization and Control · Mathematics 2020-10-06 Juan G. Rueda-Escobedo , Jaime A. Moreno

Multilevel optimization has gained renewed interest in machine learning due to its promise in applications such as hyperparameter tuning and continual learning. However, existing methods struggle with the inherent difficulty of efficiently…

Machine Learning · Computer Science 2024-10-16 Yuntian Gu , Xuzheng Chen
‹ Prev 1 2 3 10 Next ›