Related papers: Supervised Learning: No Loss No Cry

A Kernel Loss for Solving the Bellman Equation

Value function learning plays a central role in many state-of-the-art reinforcement-learning algorithms. Many popular algorithms like Q-learning do not optimize any objective function, but are fixed-point iterations of some variant of…

Machine Learning · Computer Science 2020-01-10 Yihao Feng , Lihong Li , Qiang Liu

Designing a Robust, Bounded, and Smooth Loss Function for Improved Supervised Learning

The loss function is crucial to machine learning, especially in supervised learning frameworks. It is a fundamental component that controls the behavior and general efficacy of learning algorithms. However, despite their widespread use,…

Machine Learning · Computer Science 2026-02-09 Soumi Mahato , Lineesh M. C

Deep Bregman Divergence for Contrastive Learning of Visual Representations

Deep Bregman divergence measures divergence of data points using neural networks which is beyond Euclidean distance and capable of capturing divergence over distributions. In this paper, we propose deep Bregman divergences for contrastive…

Computer Vision and Pattern Recognition · Computer Science 2021-11-24 Mina Rezaei , Farzin Soleymani , Bernd Bischl , Shekoofeh Azizi

Fast Single-Class Classification and the Principle of Logit Separation

We consider neural network training, in applications in which there are many possible classes, but at test-time, the task is a binary classification task of determining whether the given example belongs to a specific class, where the class…

Machine Learning · Statistics 2018-09-18 Gil Keren , Sivan Sabato , Björn Schuller

Learning-to-Learn Stochastic Gradient Descent with Biased Regularization

We study the problem of learning-to-learn: inferring a learning algorithm that works well on tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent on the true risk regularized by the…

Machine Learning · Computer Science 2019-03-26 Giulia Denevi , Carlo Ciliberto , Riccardo Grazzi , Massimiliano Pontil

Loss-Driven Bayesian Active Learning

The central goal of active learning is to gather data that maximises downstream predictive performance, but popular approaches have limited flexibility in customising this data acquisition to different downstream problems and losses. We…

Machine Learning · Computer Science 2026-05-11 Zhuoyue Huang , Freddie Bickford Smith , Tom Rainforth

Panprediction: Optimal Predictions for Any Downstream Task and Loss

Supervised learning is classically formulated as training a model to minimize a fixed loss function over a fixed distribution, or task. However, an emerging paradigm instead views model training as extracting enough information from data so…

Machine Learning · Computer Science 2025-11-03 Sivaraman Balakrishnan , Nika Haghtalab , Daniel Hsu , Brian Lee , Eric Zhao

Learning Gradient Descent: Better Generalization and Longer Horizons

Training deep neural networks is a highly nontrivial task, involving carefully selecting appropriate training algorithms, scheduling step sizes and tuning other hyperparameters. Trying different combinations can be quite labor-intensive and…

Machine Learning · Computer Science 2017-06-13 Kaifeng Lv , Shunhua Jiang , Jian Li

Sharp Analysis of Smoothed Bellman Error Embedding

The \textit{Smoothed Bellman Error Embedding} algorithm~\citep{dai2018sbeed}, known as SBEED, was proposed as a provably convergent reinforcement learning algorithm with general nonlinear function approximation. It has been successfully…

Machine Learning · Computer Science 2020-07-09 Ahmed Touati , Pascal Vincent

Transfer Learning via Test-Time Neural Networks Aggregation

It has been demonstrated that deep neural networks outperform traditional machine learning. However, deep networks lack generalisability, that is, they will not perform as good as in a new (testing) set drawn from a different distribution…

Machine Learning · Computer Science 2022-06-28 Bruno Casella , Alessio Barbaro Chisari , Sebastiano Battiato , Mario Valerio Giuffrida

Sparse Training of Neural Networks based on Multilevel Mirror Descent

We introduce a dynamic sparse training algorithm based on linearized Bregman iterations / mirror descent that exploits the naturally incurred sparsity by alternating between periods of static and dynamic sparsity pattern updates. The key…

Machine Learning · Computer Science 2026-05-19 Yannick Lunk , Sebastian J. Scott , Leon Bungert

Applying statistical learning theory to deep learning

Although statistical learning theory provides a robust framework to understand supervised learning, many theoretical aspects of deep learning remain unclear, in particular how different architectures may lead to inductive bias when trained…

Machine Learning · Computer Science 2024-03-27 Cédric Gerbelot , Avetik Karagulyan , Stefani Karp , Kavya Ravichandran , Menachem Stern , Nathan Srebro

Understanding Self-supervised Contrastive Learning through Supervised Objectives

Self-supervised representation learning has achieved impressive empirical success, yet its theoretical understanding remains limited. In this work, we provide a theoretical perspective by formulating self-supervised representation learning…

Machine Learning · Computer Science 2025-10-14 Byeongchan Lee

Analysis of Generalized Bregman Surrogate Algorithms for Nonsmooth Nonconvex Statistical Learning

Modern statistical applications often involve minimizing an objective function that may be nonsmooth and/or nonconvex. This paper focuses on a broad Bregman-surrogate algorithm framework including the local linear approximation, mirror…

Optimization and Control · Mathematics 2021-12-20 Yiyuan She , Zhifeng Wang , Jiuwu Jin

A Bregman Learning Framework for Sparse Neural Networks

We propose a learning framework based on stochastic Bregman iterations, also known as mirror descent, to train sparse neural networks with an inverse scale space approach. We derive a baseline algorithm called LinBreg, an accelerated…

Machine Learning · Computer Science 2022-08-16 Leon Bungert , Tim Roith , Daniel Tenbrinck , Martin Burger

LegendreTron: Uprising Proper Multiclass Loss Learning

Loss functions serve as the foundation of supervised learning and are often chosen prior to model development. To avoid potentially ad hoc choices of losses, statistical decision theory describes a desirable property for losses known as…

Machine Learning · Statistics 2023-11-30 Kevin Lam , Christian Walder , Spiridon Penev , Richard Nock

The Common Intuition to Transfer Learning Can Win or Lose: Case Studies for Linear Regression

We study a fundamental transfer learning process from source to target linear regression tasks, including overparameterized settings where there are more learned parameters than data samples. The target task learning is addressed by using…

Machine Learning · Computer Science 2024-06-03 Yehuda Dar , Daniel LeJeune , Richard G. Baraniuk

Learning Without Training

Machine learning is at the heart of managing the real-world problems associated with massive data. With the success of neural networks on such large-scale problems, more research in machine learning is being conducted now than ever before.…

Machine Learning · Computer Science 2026-02-23 Ryan O'Dowd

Neural Bregman Divergences for Distance Learning

Many metric learning tasks, such as triplet learning, nearest neighbor retrieval, and visualization, are treated primarily as embedding tasks where the ultimate metric is some variant of the Euclidean distance (e.g., cosine or Mahalanobis),…

Machine Learning · Computer Science 2023-11-22 Fred Lu , Edward Raff , Francis Ferraro

Learning Empirical Bregman Divergence for Uncertain Distance Representation

Deep metric learning techniques have been used for visual representation in various supervised and unsupervised learning tasks through learning embeddings of samples with deep networks. However, classic approaches, which employ a fixed…

Computer Vision and Pattern Recognition · Computer Science 2023-08-30 Zhiyuan Li , Ziru Liu , Anna Zou , Anca L. Ralescu