Related papers: Efficient Integer-Arithmetic-Only Convolutional Ne…

Training Integer-Only Deep Recurrent Neural Networks

Recurrent neural networks (RNN) are the backbone of many text and speech applications. These architectures are typically made up of several computationally complex components such as; non-linear activation functions, normalization,…

Machine Learning · Computer Science 2022-12-23 Vahid Partovi Nia , Eyyüb Sari , Vanessa Courville , Masoud Asgharian

iRNN: Integer-only Recurrent Neural Network

Recurrent neural networks (RNN) are used in many real-world text and speech applications. They include complex modules such as recurrence, exponential-based activation, gate interaction, unfoldable normalization, bi-directional dependence,…

Machine Learning · Computer Science 2022-02-16 Eyyüb Sari , Vanessa Courville , Vahid Partovi Nia

Integer-Only Neural Network Quantization Scheme Based on Shift-Batch-Normalization

Neural networks are very popular in many areas, but great computing complexity makes it hard to run neural networks on devices with limited resources. To address this problem, quantization methods are used to reduce model size and…

Machine Learning · Computer Science 2021-06-02 Qingyu Guo , Yuan Wang , Xiaoxin Cui

Faster Inference of Integer SWIN Transformer by Removing the GELU Activation

SWIN transformer is a prominent vision transformer model that has state-of-the-art accuracy in image classification tasks. Despite this success, its unique architecture causes slower inference compared with similar deep neural networks.…

Computer Vision and Pattern Recognition · Computer Science 2024-02-05 Mohammadreza Tayaranian , Seyyed Hasan Mozafari , James J. Clark , Brett Meyer , Warren Gross

Binarized Convolutional Neural Networks for Efficient Inference on GPUs

Convolutional neural networks have recently achieved significant breakthroughs in various image classification tasks. However, they are computationally expensive,which can make their feasible mplementation on embedded and low-power devices…

Machine Learning · Computer Science 2018-08-02 Mir Khan , Heikki Huttunen , Jani Boutellier

F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

Neural network quantization is a promising compression technique to reduce memory footprint and save energy consumption, potentially leading to real-time inference. However, there is a performance gap between quantized and full-precision…

Computer Vision and Pattern Recognition · Computer Science 2022-02-11 Qing Jin , Jian Ren , Richard Zhuang , Sumant Hanumante , Zhengang Li , Zhiyu Chen , Yanzhi Wang , Kaiyuan Yang , Sergey Tulyakov

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be…

Machine Learning · Computer Science 2017-12-19 Benoit Jacob , Skirmantas Kligys , Bo Chen , Menglong Zhu , Matthew Tang , Andrew Howard , Hartwig Adam , Dmitry Kalenichenko

Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training

Much recent research has been dedicated to improving the efficiency of training and inference for image classification. This effort has commonly focused on explicitly improving theoretical efficiency, often measured as ImageNet validation…

Machine Learning · Computer Science 2021-08-27 Dominic Masters , Antoine Labatie , Zach Eaton-Rosen , Carlo Luschi

A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network

Binary neural networks utilize 1-bit quantized weights and activations to reduce both the model's storage demands and computational burden. However, advanced binary architectures still incorporate millions of inefficient and…

Machine Learning · Computer Science 2024-03-07 Ruichen Ma , Guanchao Qiao , Yian Liu , Liwei Meng , Ning Ning , Yang Liu , Shaogang Hu

Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs

Despite foreseeing tremendous speedups over conventional deep neural networks, the performance advantage of binarized neural networks (BNNs) has merely been showcased on general-purpose processors such as CPUs and GPUs. In fact, due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-16 Ang Li , Simon Su

Alternating Multi-bit Quantization for Recurrent Neural Networks

Recurrent neural networks have achieved excellent performance in many applications. However, on portable devices with limited resources, the models are often too large to deploy. For applications on the server with large scale concurrent…

Machine Learning · Computer Science 2018-02-02 Chen Xu , Jianqiang Yao , Zhouchen Lin , Wenwu Ou , Yuanbin Cao , Zhirong Wang , Hongbin Zha

Batch Normalization-Free Fully Integer Quantized Neural Networks via Progressive Tandem Learning

Quantised neural networks (QNNs) shrink models and reduce inference energy through low-bit arithmetic, yet most still depend on a running statistics batch normalisation (BN) layer, preventing true integer-only deployment. Prior attempts…

Machine Learning · Computer Science 2025-12-19 Pengfei Sun , Wenyu Jiang , Piew Yoong Chee , Paul Devos , Dick Botteldooren

Neural Network Pruning Through Constrained Reinforcement Learning

Network pruning reduces the size of neural networks by removing (pruning) neurons such that the performance drop is minimal. Traditional pruning approaches focus on designing metrics to quantify the usefulness of a neuron which is often…

Computer Vision and Pattern Recognition · Computer Science 2021-11-01 Shehryar Malik , Muhammad Umair Haider , Omer Iqbal , Murtaza Taj

BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

Neural networks have demonstrably achieved state-of-the art accuracy using low-bitlength integer quantization, yielding both execution time and energy benefits on existing hardware designs that support short bitlengths. However, the…

Machine Learning · Computer Science 2020-08-13 Miloš Nikolić , Ghouthi Boukli Hacene , Ciaran Bannon , Alberto Delmas Lascorz , Matthieu Courbariaux , Yoshua Bengio , Vincent Gripon , Andreas Moshovos

ReBNet: Residual Binarized Neural Network

This paper proposes ReBNet, an end-to-end framework for training reconfigurable binary neural networks on software and developing efficient accelerators for execution on FPGA. Binary neural networks offer an intriguing opportunity for…

Machine Learning · Computer Science 2018-03-29 Mohammad Ghasemzadeh , Mohammad Samragh , Farinaz Koushanfar

Evaluating Robustness of Neural Networks with Mixed Integer Programming

Neural networks have demonstrated considerable success on a wide variety of real-world problems. However, networks trained only to optimize for training accuracy can often be fooled by adversarial examples - slightly perturbed inputs that…

Machine Learning · Computer Science 2019-02-19 Vincent Tjeng , Kai Xiao , Russ Tedrake

Accelerating Neural Network Inference by Overflow Aware Quantization

The inherent heavy computation of deep neural networks prevents their widespread applications. A widely used method for accelerating model inference is quantization, by replacing the input operands of a network using fixed-point values.…

Computer Vision and Pattern Recognition · Computer Science 2020-05-28 Hongwei Xie , Shuo Zhang , Huanghao Ding , Yafei Song , Baitao Shao , Conggang Hu , Ling Cai , Mingyang Li

LeanResNet: A Low-cost Yet Effective Convolutional Residual Networks

Convolutional Neural Networks (CNNs) filter the input data using spatial convolution operators with compact stencils. Commonly, the convolution operators couple features from all channels, which leads to immense computational cost in the…

Machine Learning · Computer Science 2019-05-17 Jonathan Ephrath , Lars Ruthotto , Eldad Haber , Eran Treister

NITI: Training Integer Neural Networks Using Integer-only Arithmetic

While integer arithmetic has been widely adopted for improved performance in deep quantized neural network inference, training remains a task primarily executed using floating point arithmetic. This is because both high dynamic range and…

Computer Vision and Pattern Recognition · Computer Science 2022-02-14 Maolin Wang , Seyedramin Rasoulinezhad , Philip H. W. Leong , Hayden K. H. So

Convolutional Neural Networks for Non-iterative Reconstruction of Compressively Sensed Images

Traditional algorithms for compressive sensing recovery are computationally expensive and are ineffective at low measurement rates. In this work, we propose a data driven non-iterative algorithm to overcome the shortcomings of earlier…

Computer Vision and Pattern Recognition · Computer Science 2017-08-18 Suhas Lohit , Kuldeep Kulkarni , Ronan Kerviche , Pavan Turaga , Amit Ashok