Related papers: Large-Scale Learning with Less RAM via Randomizati…

Direct Quantized Training of Language Models with Stochastic Rounding

Although recent quantized Large Language Models (LLMs), such as BitNet, have paved the way for significant reduction in memory usage during deployment with binary or ternary weights, training these models still demands substantial memory…

Machine Learning · Computer Science 2025-10-13 Kaiyan Zhao , Tsuguchika Tabaru , Kenichi Kobayashi , Takumi Honda , Masafumi Yamazaki , Yoshimasa Tsuruoka

Coding for Random Projections

The method of random projections has become very popular for large-scale applications in statistical learning, information retrieval, bio-informatics and other applications. Using a well-designed coding scheme for the projected data, which…

Machine Learning · Computer Science 2013-08-12 Ping Li , Michael Mitzenmacher , Anshumali Shrivastava

One-Bit Quantization for Random Features Models

Recent advances in neural networks have led to significant computational and memory demands, spurring interest in one-bit weight compression to enable efficient inference on resource-constrained devices. However, the theoretical…

Machine Learning · Computer Science 2025-10-21 Danil Akhtiamov , Reza Ghane , Babak Hassibi

Online Learning in the Random Order Model

In the random-order model for online learning, the sequence of losses is chosen upfront by an adversary and presented to the learner after a random permutation. Any random-order input is \emph{asymptotically} equivalent to a stochastic…

Machine Learning · Computer Science 2025-10-06 Martino Bernasconi , Andrea Celli , Riccardo Colini-Baldeschi , Federico Fusco , Stefano Leonardi , Matteo Russo

And the Bit Goes Down: Revisiting the Quantization of Neural Networks

In this paper, we address the problem of reducing the memory footprint of convolutional network architectures. We introduce a vector quantization method that aims at preserving the quality of the reconstruction of the network outputs rather…

Computer Vision and Pattern Recognition · Computer Science 2020-11-10 Pierre Stock , Armand Joulin , Rémi Gribonval , Benjamin Graham , Hervé Jégou

Getting Free Bits Back from Rotational Symmetries in LLMs

Current methods for compressing neural network weights, such as decomposition, pruning, quantization, and channel simulation, often overlook the inherent symmetries within these networks and thus waste bits on encoding redundant…

Information Theory · Computer Science 2024-10-03 Jiajun He , Gergely Flamich , José Miguel Hernández-Lobato

Rounding Methods for Neural Networks with Low Resolution Synaptic Weights

Neural network algorithms simulated on standard computing platforms typically make use of high resolution weights, with floating-point notation. However, for dedicated hardware implementations of such algorithms, fixed-point synaptic…

Neural and Evolutionary Computing · Computer Science 2015-04-23 Lorenz K. Muller , Giacomo Indiveri

Deep Learning with Limited Numerical Precision

Training of large-scale deep neural networks is often constrained by the available computational resources. We study the effect of limited precision data representation and computation on neural network training. Within the context of…

Machine Learning · Computer Science 2015-02-11 Suyog Gupta , Ankur Agrawal , Kailash Gopalakrishnan , Pritish Narayanan

Learning Large Scale Sparse Models

In this work, we consider learning sparse models in large scale settings, where the number of samples and the feature dimension can grow as large as millions or billions. Two immediate issues occur under such challenging scenario: (i)…

Machine Learning · Statistics 2023-01-31 Atul Dhingra , Jie Shen , Nicholas Kleene

Leveraging Lightweight Generators for Memory Efficient Continual Learning

Catastrophic forgetting can be trivially alleviated by keeping all data from previous tasks in memory. Therefore, minimizing the memory footprint while maximizing the amount of relevant information is crucial to the challenge of continual…

Machine Learning · Computer Science 2025-06-25 Christiaan Lamers , Ahmed Nabil Belbachir , Thomas Bäck , Niki van Stein

Reuse, Don't Recompute: Efficient Large Reasoning Model Inference via Memory Orchestration

Large reasoning models (LRMs) achieve strong accuracy through test-time scaling, generating longer chains of thought or sampling multiple solutions, but at steep costs in tokens and latency. We argue that memory is a core ingredient for…

Multiagent Systems · Computer Science 2026-03-04 Daivik Patel , Shrenik Patel

Training Large Reasoning Models Efficiently via Progressive Thought Encoding

Large reasoning models (LRMs) excel on complex problems but face a critical barrier to efficiency: reinforcement learning (RL) training requires long rollouts for outcome-based rewards, where autoregressive decoding dominates time and…

Machine Learning · Computer Science 2026-02-20 Zeliang Zhang , Xiaodong Liu , Hao Cheng , Hao Sun , Chenliang Xu , Jianfeng Gao

Random Weight Factorization Improves the Training of Continuous Neural Representations

Continuous neural representations have recently emerged as a powerful and flexible alternative to classical discretized representations of signals. However, training them to capture fine details in multi-scale signals is difficult and…

Machine Learning · Computer Science 2022-10-06 Sifan Wang , Hanwen Wang , Jacob H. Seidman , Paris Perdikaris

Understanding the Difficulty of Low-Precision Post-Training Quantization for LLMs

Large language models of high parameter counts are computationally expensive, yet can be made much more efficient by compressing their weights to very low numerical precision. This can be achieved either through post-training quantization…

Machine Learning · Computer Science 2025-04-21 Zifei Xu , Sayeh Sharify , Wanzin Yazar , Tristan Webb , Xin Wang

Low-Resolution Neural Networks

The expanding scale of large neural network models introduces significant challenges, driving efforts to reduce memory usage and enhance computational efficiency. Such measures are crucial to ensure the practical implementation and…

Machine Learning · Computer Science 2025-02-14 Eduardo Lobo Lustosa Cabral , Larissa Driemeier

Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models

1-bit LLM quantization offers significant advantages in reducing storage and computational costs. However, existing methods typically train 1-bit LLMs from scratch, failing to fully leverage pre-trained models. This results in high training…

Computation and Language · Computer Science 2026-05-19 Zhijun Tu , Jian Li , Yuanyuan Xi , Siqi Liu , Chuanjian Liu , Hanting Chen , Jie Hu , Yunhe Wang

A Generalized Weighted Optimization Method for Computational Learning and Inversion

The generalization capacity of various machine learning models exhibits different phenomena in the under- and over-parameterized regimes. In this paper, we focus on regression models such as feature regression and kernel regression and…

Machine Learning · Computer Science 2022-03-14 Björn Engquist , Kui Ren , Yunan Yang

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Large Language Models (LLMs) have demonstrated exceptional proficiency in language-related tasks, but their deployment poses significant challenges due to substantial memory and storage requirements. Weight-only quantization has emerged as…

Computation and Language · Computer Science 2024-10-10 Wenhua Cheng , Weiwei Zhang , Haihao Shen , Yiyang Cai , Xin He , Kaokao Lv , Yi Liu

Evaluating the Impact of Post-Training Quantization on Large Language Models for Code Generation

Large Language Models (LLMs) have shown an impressive capability in code generation. The LLM effectiveness generally increases with its size: The higher the number of LLM's trainable parameters the better its ability to implement code.…

Software Engineering · Computer Science 2026-01-28 Alessandro Giagnorio , Antonio Mastropaolo , Saima Afrin , Massimiliano Di Penta , Gabriele Bavota

Compact representations of convolutional neural networks via weight pruning and quantization

The state-of-the-art performance for several real-world problems is currently reached by convolutional neural networks (CNN). Such learning models exploit recent results in the field of deep learning, typically leading to highly performing,…

Machine Learning · Computer Science 2021-08-31 Giosuè Cataldo Marinò , Alessandro Petrini , Dario Malchiodi , Marco Frasca