Related papers: Quantization-Aware Imitation-Learning for Resource…

Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control

Deep neural network (DNN)-based policy models, such as vision-language-action (VLA) models, excel at automating complex decision-making from multi-modal inputs. However, scaling these models greatly increases computational overhead,…

Robotics · Computer Science 2025-06-02 Seongmin Park , Hyungmin Kim , Sangwoo Kim , Wonseok Jeon , Juyoung Yang , Byeongwook Jeon , Yoonseon Oh , Jungwook Choi

Optimizing Deep Neural Networks using Safety-Guided Self Compression

The deployment of deep neural networks on resource-constrained devices necessitates effective model com- pression strategies that judiciously balance the reduction of model size with the preservation of performance. This study introduces a…

Machine Learning · Computer Science 2025-05-02 Mohammad Zbeeb , Mariam Salman , Mohammad Bazzi , Ammar Mohanna

Hardware-Centric AutoML for Mixed-Precision Quantization

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency,…

Computer Vision and Pattern Recognition · Computer Science 2020-08-14 Kuan Wang , Zhijian Liu , Yujun Lin , Ji Lin , Song Han

Adaptive Quantization for Deep Neural Network

In recent years Deep Neural Networks (DNNs) have been rapidly developed in various applications, together with increasingly complex architectures. The performance gain of these DNNs generally comes with high computational costs and large…

Machine Learning · Computer Science 2017-12-05 Yiren Zhou , Seyed-Mohsen Moosavi-Dezfooli , Ngai-Man Cheung , Pascal Frossard

QVLA: Not All Channels Are Equal in Vision-Language-Action Model's Quantization

The advent of Vision-Language-Action (VLA) models represents a significant leap for embodied intelligence, yet their immense computational demands critically hinder deployment on resource-constrained robotic platforms. Intuitively, low-bit…

Computer Vision and Pattern Recognition · Computer Science 2026-02-04 Yuhao Xu , Yantai Yang , Zhenyang Fan , Yufan Liu , Yuming Li , Bing Li , Zhipeng Zhang

DyQ-VLA: Temporal-Dynamic-Aware Quantization for Embodied Vision-Language-Action Models

Vision-Language-Action (VLA) models are dominant in embodied intelligence but are constrained by inference overheads. While model quantization alleviates these bottlenecks for edge deployment, static quantization approaches remain…

Machine Learning · Computer Science 2026-03-17 Zihao Zheng , Hangyu Cao , Sicheng Tian , Jiayu Chen , Maoliang Li , Xinhao Sun , Hailong Zou , Zhaobo Zhang , Xuanzhe Liu , Donggang Cao , Hong Mei , Xiang Chen

HAQ: Hardware-Aware Automated Quantization with Mixed Precision

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency,…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Kuan Wang , Zhijian Liu , Yujun Lin , Ji Lin , Song Han

Automatic low-bit hybrid quantization of neural networks through meta learning

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference, especially when deploying to edge or IoT devices with limited computation capacity and power consumption budget. The uniform bit…

Machine Learning · Computer Science 2020-04-27 Tao Wang , Junsong Wang , Chang Xu , Chao Xue

Bit Efficient Quantization for Deep Neural Networks

Quantization for deep neural networks have afforded models for edge devices that use less on-board memory and enable efficient low-power inference. In this paper, we present a comparison of model-parameter driven quantization approaches…

Computer Vision and Pattern Recognition · Computer Science 2019-10-14 Prateeth Nayak , David Zhang , Sek Chai

DA-PTQ: Drift-Aware Post-Training Quantization for Efficient Vision-Language-Action Models

Vision-Language-Action models (VLAs) have demonstrated strong potential for embodied AI, yet their deployment on resource-limited robots remains challenging due to high memory and computational demands. While Post-Training Quantization…

Robotics · Computer Science 2026-04-14 Siyuan Xu , Tianshi Wang , Fengling Li , Lei Zhu , Heng Tao Shen

Quantization-Aware Regularizers for Deep Neural Networks Compression

Deep Neural Networks reached state-of-the-art performance across numerous domains, but this progress has come at the cost of increasingly large and over-parameterized models, posing serious challenges for deployment on resource-constrained…

Machine Learning · Computer Science 2026-02-04 Dario Malchiodi , Mattia Ferraretto , Marco Frasca

Dataflow-based Joint Quantization of Weights and Activations for Deep Neural Networks

This paper addresses a challenging problem - how to reduce energy consumption without incurring performance drop when deploying deep neural networks (DNNs) at the inference stage. In order to alleviate the computation and storage burdens,…

Machine Learning · Computer Science 2019-01-09 Xue Geng , Jie Fu , Bin Zhao , Jie Lin , Mohamed M. Sabry Aly , Christopher Pal , Vijay Chandrasekhar

Just Round: Quantized Observation Spaces Enable Memory Efficient Learning of Dynamic Locomotion

Deep reinforcement learning (DRL) is one of the most powerful tools for synthesizing complex robotic behaviors. But training DRL models is incredibly compute and memory intensive, requiring large training datasets and replay buffers to…

Robotics · Computer Science 2023-04-25 Lev Grossman , Brian Plancher

Quantization of Deep Neural Networks for Accurate Edge Computing

Deep neural networks (DNNs) have demonstrated their great potential in recent years, exceeding the per-formance of human experts in a wide range of applications. Due to their large sizes, however, compressiontechniques such as weight…

Computer Vision and Pattern Recognition · Computer Science 2021-10-15 Wentao Chen , Hailong Qiu , Jian Zhuang , Chutong Zhang , Yu Hu , Qing Lu , Tianchen Wang , Yiyu Shi , Meiping Huang , Xiaowe Xu

A Comprehensive Study on Quantization Techniques for Large Language Models

Large Language Models (LLMs) have been extensively researched and used in both academia and industry since the rise in popularity of the Transformer model, which demonstrates excellent performance in AI. However, the computational demands…

Machine Learning · Computer Science 2024-11-06 Jiedong Lang , Zhehao Guo , Shuyu Huang

Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments

Deep learning-based models are at the forefront of most driver observation benchmarks due to their remarkable accuracies but are also associated with high computational costs. This is challenging, as resources are often limited in…

Computer Vision and Pattern Recognition · Computer Science 2023-11-13 Calvin Tanama , Kunyu Peng , Zdravko Marinov , Rainer Stiefelhagen , Alina Roitberg

Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for…

Machine Learning · Computer Science 2023-12-27 Konstantinos Balaskas , Andreas Karatzas , Christos Sad , Kostas Siozios , Iraklis Anagnostopoulos , Georgios Zervakis , Jörg Henkel

BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

Deploying powerful Vision-Language-Action (VLA) models on edge devices is limited by their massive size. In this paper, we take a deployment-oriented view of VLA training: we target efficiency through model design and optimization, rather…

Robotics · Computer Science 2026-03-03 Hongyu Wang , Chuyan Xiong , Ruiping Wang , Xilin Chen

Retraining-Based Iterative Weight Quantization for Deep Neural Networks

Model compression has gained a lot of attention due to its ability to reduce hardware resource requirements significantly while maintaining accuracy of DNNs. Model compression is especially useful for memory-intensive recurrent neural…

Machine Learning · Computer Science 2018-05-30 Dongsoo Lee , Byeongwook Kim

Device-Conditioned Neural Architecture Search for Efficient Robotic Manipulation

The growing complexity of visuomotor policies poses significant challenges for deployment with heterogeneous robotic hardware constraints. However, most existing model-efficient approaches for robotic manipulation are device- and…

Robotics · Computer Science 2026-04-14 Yiming Wu , Huan Wang , Zhenghao Chen , Ge Yuan , Dong Xu