Related papers: Universal Sequence Preconditioning

The Power of Second Order Methods for Sequence Preconditioning

Sequence prediction methods for dynamical systems with long memory, i.e. marginally stable systems, typically achieve regret that grows polynomially with the hidden dimension of the underlying generative model. Universal Sequence…

Machine Learning · Computer Science 2026-05-12 Annie Marsden , Elad Hazan

Universal Codes from Switching Strategies

We discuss algorithms for combining sequential prediction strategies, a task which can be viewed as a natural generalisation of the concept of universal coding. We describe a graphical language based on Hidden Markov Models for defining…

Information Theory · Computer Science 2013-11-27 Wouter M. Koolen , Steven de Rooij

Polynomial Preconditioning for Gradient Methods

We study first-order methods with preconditioning for solving structured nonlinear convex optimization problems. We propose a new family of preconditioners generated by symmetric polynomials. They provide first-order optimization methods…

Optimization and Control · Mathematics 2023-01-31 Nikita Doikov , Anton Rodomanov

Optimal Diagonal Preconditioning

Preconditioning has long been a staple technique in optimization, often applied to reduce the condition number of a matrix and speed up the convergence of algorithms. Although there are many popular preconditioning techniques in practice,…

Optimization and Control · Mathematics 2022-11-08 Zhaonan Qu , Wenzhi Gao , Oliver Hinder , Yinyu Ye , Zhengyuan Zhou

Polynomial Preconditioners for Regularized Linear Inverse Problems

This work aims to accelerate the convergence of proximal gradient methods used to solve regularized linear inverse problems. This is achieved by designing a polynomial-based preconditioner that targets the eigenvalue spectrum of the normal…

Medical Physics · Physics 2022-09-27 Siddharth Srinivasan Iyer , Frank Ong , Xiaozhi Cao , Congyu Liao , Luca Daniel , Jonathan I. Tamir , Kawin Setsompop

Matrix-Free Preconditioning in Online Learning

We provide an online convex optimization algorithm with regret that interpolates between the regret of an algorithm using an optimal preconditioning matrix and one using a diagonal preconditioning matrix. Our regret bound is never worse…

Machine Learning · Computer Science 2019-05-31 Ashok Cutkosky , Tamas Sarlos

Parallel Newton-Chebyshev Polynomial Preconditioners for the Conjugate Gradient method

In this note we exploit polynomial preconditioners for the Conjugate Gradient method to solve large symmetric positive definite linear systems in a parallel environment. We put in connection a specialized Newton method to solve the matrix…

Numerical Analysis · Mathematics 2020-11-30 Luca Bergamaschi , Angeles Martinez

Universal Online Optimization in Dynamic Environments via Uniclass Prediction

Recently, several universal methods have been proposed for online convex optimization which can handle convex, strongly convex and exponentially concave cost functions simultaneously. However, most of these algorithms have been designed…

Machine Learning · Computer Science 2023-02-14 Arnold Salas

High-Dimensional Prediction for Sequential Decision Making

We study the problem of making predictions of an adversarially chosen high-dimensional state that are unbiased subject to an arbitrary collection of conditioning events, with the goal of tailoring these events to downstream decision makers.…

Machine Learning · Computer Science 2023-10-30 Georgy Noarov , Ramya Ramalingam , Aaron Roth , Stephan Xie

Polynomial Preconditioning for the Action of the Matrix Square Root and Inverse Square Root

While preconditioning is a long-standing concept to accelerate iterative methods for linear systems, generalizations to matrix functions are still in their infancy. We go a further step in this direction, introducing polynomial…

Numerical Analysis · Mathematics 2024-01-15 Andreas Frommer , Gustavo Ramirez-Hidalgo , Marcel Schweitzer , Manuel Tsolakis

Polynomial Preconditioning for Indefinite Matrices

Polynomial preconditioning is an important tool in solving large linear systems and eigenvalue problems. A polynomial from GMRES can be used to precondition restarted GMRES and restarted Arnoldi. Here we give methods for indefinite matrices…

Numerical Analysis · Mathematics 2025-10-17 Hayden Henson , Ronald B. Morgan

Universal time-series forecasting with mixture predictors

This book is devoted to the problem of sequential probability forecasting, that is, predicting the probabilities of the next outcome of a growing sequence of observations given the past. This problem is considered in a very general setting…

Machine Learning · Computer Science 2025-04-22 Daniil Ryabko

On subspace-constrained preconditioning for randomized iterative methods

In this paper, we further investigate and refine the subspace-constrained preconditioning technique to enhance the theoretical and numerical convergence properties of randomized iterative methods for solving linear systems. In particular,…

Numerical Analysis · Mathematics 2026-05-29 Yonghan Sun , Hou-Duo Qi , Deren Han , Jiaxin Xie

Less Regret via Online Conditioning

We analyze and evaluate an online gradient descent algorithm with adaptive per-coordinate adjustment of learning rates. Our algorithm can be thought of as an online version of batch gradient descent with a diagonal preconditioner. This…

Machine Learning · Computer Science 2010-02-26 Matthew Streeter , H. Brendan McMahan

Uncovering mesa-optimization algorithms in Transformers

Some autoregressive models exhibit in-context learning capabilities: being able to learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so. The origins of this…

Machine Learning · Computer Science 2024-10-16 Johannes von Oswald , Maximilian Schlegel , Alexander Meulemans , Seijin Kobayashi , Eyvind Niklasson , Nicolas Zucchet , Nino Scherrer , Nolan Miller , Mark Sandler , Blaise Agüera y Arcas , Max Vladymyrov , Razvan Pascanu , João Sacramento

Universal Prediction of Selected Bits

Many learning tasks can be viewed as sequence prediction problems. For example, online classification can be converted to sequence prediction with the sequence being pairs of input/target data and where the goal is to correctly predict the…

Machine Learning · Computer Science 2012-02-10 Tor Lattimore , Marcus Hutter , Vaibhav Gavane

Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations

Sequence-to-sequence (seq2seq) models are prevalent in semantic parsing, but have been found to struggle at out-of-distribution compositional generalization. While specialized model architectures and pre-training of seq2seq models have been…

Computation and Language · Computer Science 2021-04-16 Jonathan Herzig , Peter Shaw , Ming-Wei Chang , Kelvin Guu , Panupong Pasupat , Yuan Zhang

(How) Can Transformers Predict Pseudo-Random Numbers?

Transformers excel at discovering patterns in sequential data, yet their fundamental limitations and learning mechanisms remain crucial topics of investigation. In this paper, we study the ability of Transformers to learn pseudo-random…

Machine Learning · Computer Science 2025-07-10 Tao Tao , Darshil Doshi , Dayal Singh Kalra , Tianyu He , Maissam Barkeshli

Universal Approximation Theorem for a Single-Layer Transformer

Deep learning employs multi-layer neural networks trained via the backpropagation algorithm. This approach has achieved success across many domains and relies on adaptive gradient methods such as the Adam optimizer. Sequence modeling…

Machine Learning · Computer Science 2025-07-16 Esmail Gumaan

On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning

Layer-wise preconditioning methods are a family of memory-efficient optimization algorithms that introduce preconditioners per axis of each layer's weight tensors. These methods have seen a recent resurgence, demonstrating impressive…

Machine Learning · Computer Science 2025-02-05 Thomas T. Zhang , Behrad Moniri , Ansh Nagwekar , Faraz Rahman , Anton Xue , Hamed Hassani , Nikolai Matni