English
Related papers

Related papers: Memory-Efficient Optimization with Factorized Hami…

200 papers

Modern adaptive optimization methods, such as Adam and its variants, have emerged as the most widely used tools in deep learning over recent years. These algorithms offer automatic mechanisms for dynamically adjusting the update step based…

Machine Learning · Computer Science 2025-02-12 Son Nguyen , Bo Liu , Lizhang Chen , Qiang Liu

Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling. However, these methods maintain second-order statistics for each parameter,…

Machine Learning · Computer Science 2019-09-13 Rohan Anil , Vineet Gupta , Tomer Koren , Yoram Singer

Factorization machines (FMs) are a supervised learning approach that can use second-order feature combinations even when the data is very high-dimensional. Unfortunately, despite increasing interest in FMs, there exists to date no efficient…

Machine Learning · Statistics 2016-10-17 Mathieu Blondel , Akinori Fujino , Naonori Ueda , Masakazu Ishihata

We propose SMMF (Square-Matricized Momentum Factorization), a memory-efficient optimizer that reduces the memory requirement of the widely used adaptive learning rate optimizers, such as Adam, by up to 96%. SMMF enables flexible and…

Machine Learning · Computer Science 2025-05-01 Kwangryeol Park , Seulki Lee

As deep learning models exponentially increase in size, optimizers such as Adam encounter significant memory consumption challenges due to the storage of first and second moment data. Current memory-efficient methods like Adafactor and CAME…

Machine Learning · Computer Science 2024-03-25 Pengxiang Zhao , Ping Li , Yingjie Gu , Yi Zheng , Stephan Ludger Kölker , Zhefeng Wang , Xiaoming Yuan

Matrix factorization has now become a dominant solution for personalized recommendation on the Social Web. To alleviate the cold start problem, previous approaches have incorporated various additional sources of information into traditional…

Information Retrieval · Computer Science 2017-08-15 Zhenghua Xu , Cheng Chen , Thomas Lukasiewicz , Yishu Miao

Factorization Machines (FM) are powerful class of models that incorporate higher-order interaction among features to add more expressive power to linear models. They have been used successfully in several real-world tasks such as…

Machine Learning · Computer Science 2020-04-30 Parameswaran Raman , S. V. N. Vishwanathan

Constrained low-rank matrix approximations have been known for decades as powerful linear dimensionality reduction techniques to be able to extract the information contained in large data sets in a relevant way. However, such low-rank…

Machine Learning · Computer Science 2021-12-20 Pierre De Handschutter , Nicolas Gillis , Xavier Siebert

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has…

Machine Learning · Computer Science 2017-01-31 Diederik P. Kingma , Jimmy Ba

Machine learning assumes a pivotal role in our data-driven world. The increasing scale of models and datasets necessitates quick and reliable algorithms for model training. This dissertation investigates adaptivity in machine learning…

Machine Learning · Computer Science 2023-11-20 Slavomír Hanzely

Federated learning on edge devices must cope with non-IID client data and tight memory budgets. Adaptive optimizers like Adam stabilize training under data heterogeneity but require storing full-precision momentum and variance states, often…

Machine Learning · Computer Science 2026-05-19 Vedant Waykole , Haroon R. Lone

Humans exhibit complex motions that vary depending on the task that they are performing, the interactions they engage in, as well as subject-specific preferences. Therefore, forecasting future poses based on the history of the previous…

Computer Vision and Pattern Recognition · Computer Science 2023-05-22 Tharindu Fernando , Harshala Gammulle , Sridha Sridharan , Simon Denman , Clinton Fookes

Efficient characterization of quantum devices is a significant challenge critical for the development of large scale quantum computers. We consider an experimentally motivated situation, in which we have a decent estimate of the…

Quantum Physics · Physics 2021-04-12 Przemyslaw Bienias , Alireza Seif , Mohammad Hafezi

In modern deep learning, the models are learned by applying gradient updates using an optimizer, which transforms the updates based on various statistics. Optimizers are often hand-designed and tuning their hyperparameters is a big part of…

Machine Learning · Computer Science 2024-10-08 Gus Kristiansen , Mark Sandler , Andrey Zhmoginov , Nolan Miller , Anirudh Goyal , Jihwan Lee , Max Vladymyrov

Adam-type optimizers, as a class of adaptive moment estimation methods with the exponential moving average scheme, have been successfully used in many applications of deep learning. Such methods are appealing due to the capability on…

Machine Learning · Computer Science 2020-12-17 Bingxin Zhou , Xuebin Zheng , Junbin Gao

Deep neural networks are traditionally trained using human-designed stochastic optimization algorithms, such as SGD and Adam. Recently, the approach of learning to optimize network parameters has emerged as a promising research topic.…

Machine Learning · Computer Science 2018-11-26 Shipeng Wang , Jian Sun , Zongben Xu

This paper advances the computational efficiency of Deep Hedging frameworks through the novel integration of Kronecker-Factored Approximate Curvature (K-FAC) optimization. While recent literature has established Deep Hedging as a…

Statistical Finance · Quantitative Finance 2024-11-25 Tsogt-Ochir Enkhbayar

Balancing convergence speed, generalization capability, and computational efficiency remains a core challenge in deep learning optimization. First-order gradient descent methods, epitomized by stochastic gradient descent (SGD) and Adam,…

In this paper, we investigate the popular deep learning optimization routine, Adam, from the perspective of statistical moments. While Adam is an adaptive lower-order moment based (of the stochastic gradient) method, we propose an extension…

Machine Learning · Computer Science 2019-10-16 Zhanhong Jiang , Aditya Balu , Sin Yong Tan , Young M Lee , Chinmay Hegde , Soumik Sarkar

The vast majority of modern deep learning models are trained with momentum-based first-order optimizers. The momentum term governs the optimizer's memory by determining how much each past gradient contributes to the current convergence…

Machine Learning · Computer Science 2026-05-12 Kristi Topollai , Anna Choromanska
‹ Prev 1 2 3 10 Next ›