Related papers: The Second Order Linear Model

Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study

While first-order optimization methods such as stochastic gradient descent (SGD) are popular in machine learning (ML), they come with well-known deficiencies, including relatively-slow convergence, sensitivity to the settings of…

Optimization and Control · Mathematics 2018-02-19 Peng Xu , Farbod Roosta-Khorasani , Michael W. Mahoney

On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms

We study the convergence of a class of gradient-based Model-Agnostic Meta-Learning (MAML) methods and characterize their overall complexity as well as their best achievable accuracy in terms of gradient norm for nonconvex loss functions. We…

Machine Learning · Computer Science 2020-05-19 Alireza Fallah , Aryan Mokhtari , Asuman Ozdaglar

Sharpness-aware Second-order Latent Factor Model for High-dimensional and Incomplete Data

Second-order Latent Factor (SLF) model, a class of low-rank representation learning methods, has proven effective at extracting node-to-node interaction patterns from High-dimensional and Incomplete (HDI) data. However, its optimization is…

Machine Learning · Computer Science 2025-12-19 Jialiang Wang , Xueyan Bao , Hao Wu

Second-order Symmetric Non-negative Latent Factor Analysis

Precise representation of large-scale undirected network is the basis for understanding relations within a massive entity set. The undirected network representation task can be efficiently addressed by a symmetry non-negative latent factor…

Machine Learning · Computer Science 2022-03-09 Weiling Li , Xin Luo

Second-Order Stochastic Optimization for Machine Learning in Linear Time

First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity. Second-order methods, while able to provide faster convergence, have been much less explored…

Machine Learning · Statistics 2017-12-01 Naman Agarwal , Brian Bullins , Elad Hazan

Understanding the Principles of Recursive Neural networks: A Generative Approach to Tackle Model Complexity

Recursive Neural Networks are non-linear adaptive models that are able to learn deep structured information. However, these models have not yet been broadly accepted. This fact is mainly due to its inherent complexity. In particular, not…

Neural and Evolutionary Computing · Computer Science 2009-11-18 Alejandro Chinea

Distributed Learning in Non-Convex Environments -- Part I: Agreement at a Linear Rate

Driven by the need to solve increasingly complex optimization problems in signal processing and machine learning, there has been increasing interest in understanding the behavior of gradient-descent algorithms in non-convex environments.…

Optimization and Control · Mathematics 2019-07-04 Stefan Vlaski , Ali H. Sayed

Beyond First-Order: Training LLMs with Stochastic Conjugate Subgradients and AdamW

Stochastic gradient-based descent (SGD), have long been central to training large language models (LLMs). However, their effectiveness is increasingly being questioned, particularly in large-scale applications where empirical evidence…

Machine Learning · Computer Science 2025-07-03 Di Zhang , Yihang Zhang

PETScML: Second-order solvers for training regression problems in Scientific Machine Learning

In recent years, we have witnessed the emergence of scientific machine learning as a data-driven tool for the analysis, by means of deep-learning techniques, of data produced by computational science and engineering applications. At the…

Machine Learning · Computer Science 2024-03-20 Stefano Zampini , Umberto Zerbinati , George Turkiyyah , David Keyes

Learning a Class of Mixed Linear Regressions: Global Convergence under General Data Conditions

Mixed linear regression (MLR) has attracted increasing attention because of its great theoretical and practical importance in capturing nonlinear relationships by utilizing a mixture of linear regression sub-models. Although considerable…

Machine Learning · Statistics 2025-03-25 Yujing Liu , Zhixin Liu , Lei Guo

Supervised Descent Method for Solving Nonlinear Least Squares Problems in Computer Vision

Many computer vision problems (e.g., camera calibration, image alignment, structure from motion) are solved with nonlinear optimization methods. It is generally accepted that second order descent methods are the most robust, fast, and…

Computer Vision and Pattern Recognition · Computer Science 2014-05-06 Xuehan Xiong , Fernando De la Torre

Online Learning Under A Separable Stochastic Approximation Framework

We propose an online learning algorithm for a class of machine learning models under a separable stochastic approximation framework. The essence of our idea lies in the observation that certain parameters in the models are easier to…

Machine Learning · Computer Science 2023-05-23 Min Gan , Xiang-xiang Su , Guang-yong Chen , Jing Chen

Low-Order Explicit Hessian Imitation Method for Large-Scale Supervised Machine Learning

An algorithm is proposed for solving optimization problems arising in neural network training for supervised learning. The unique feature of the algorithm is the use of an auxiliary loss, in addition to the original loss employed for model…

Optimization and Control · Mathematics 2026-05-11 Yunlang Zhu , Lingjun Guo , Zahra Khatti , Xiaoyi Qu , Chia-Yuan Wu , Lara Zebiane , Frank E. Curtis

LSOS: Line-search Second-Order Stochastic optimization methods for nonconvex finite sums

We develop a line-search second-order algorithmic framework for minimizing finite sums. We do not make any convexity assumptions, but require the terms of the sum to be continuously differentiable and have Lipschitz-continuous gradients.…

Optimization and Control · Mathematics 2022-06-28 Daniela di Serafino , Nataša Krejić , Nataša Krklec Jerinkić , Marco Viola

Escaping Saddle Points for Zeroth-order Nonconvex Optimization using Estimated Gradient Descent

Gradient descent and its variants are widely used in machine learning. However, oracle access of gradient may not be available in many applications, limiting the direct use of gradient descent. This paper proposes a method of estimating…

Optimization and Control · Mathematics 2019-10-07 Qinbo Bai , Mridul Agarwal , Vaneet Aggarwal

Random Scaling and Momentum for Non-smooth Non-convex Optimization

Training neural networks requires optimizing a loss function that may be highly irregular, and in particular neither convex nor smooth. Popular training algorithms are based on stochastic gradient descent with momentum (SGDM), for which…

Machine Learning · Computer Science 2026-03-17 Qinzi Zhang , Ashok Cutkosky

Second-Order Convergence in Private Stochastic Non-Convex Optimization

We investigate the problem of finding second-order stationary points (SOSP) in differentially private (DP) stochastic non-convex optimization. Existing methods suffer from two key limitations: (i) inaccurate convergence error rate due to…

Machine Learning · Computer Science 2026-01-21 Youming Tao , Zuyuan Zhang , Dongxiao Yu , Xiuzhen Cheng , Falko Dressler , Di Wang

Memory-Efficient Learning of Stable Linear Dynamical Systems for Prediction and Control

Learning a stable Linear Dynamical System (LDS) from data involves creating models that both minimize reconstruction error and enforce stability of the learned representation. We propose a novel algorithm for learning stable LDSs. Using a…

Machine Learning · Computer Science 2020-11-19 Giorgos Mamakoukas , Orest Xherija , T. D. Murphey

Improving SGD convergence by online linear regression of gradients in multiple statistically relevant directions

Deep neural networks are usually trained with stochastic gradient descent (SGD), which minimizes objective function using very rough approximations of gradient, only averaging to the real gradient. Standard approaches like momentum or ADAM…

Machine Learning · Computer Science 2023-03-14 Jarek Duda

Better LMO-based Momentum Methods with Second-Order Information

The use of momentum in stochastic optimization algorithms has shown empirical success across a range of machine learning tasks. Recently, a new class of stochastic momentum algorithms has emerged within the Linear Minimization Oracle (LMO)…

Optimization and Control · Mathematics 2025-12-16 Sarit Khirirat , Abdurakhmon Sadiev , Yury Demidovich , Peter Richtárik