Related papers: Pyramid Vector Quantization for Deep Learning

Vector Quantization for Machine Vision

This paper shows how to reduce the computational cost for a variety of common machine vision tasks by operating directly in the compressed domain, particularly in the context of hardware acceleration. Pyramid Vector Quantization (PVQ) is…

Computer Vision and Pattern Recognition · Computer Science 2016-03-31 Vincenzo Liguori

Pyramid Vector Quantization and Bit Level Sparsity in Weights for Efficient Neural Networks Inference

This paper discusses three basic blocks for the inference of convolutional neural networks (CNNs). Pyramid Vector Quantization (PVQ) is discussed as an effective quantizer for CNNs weights resulting in highly sparse and compressible…

Computer Vision and Pattern Recognition · Computer Science 2019-11-26 Vincenzo Liguori

Individualized non-uniform quantization for vector search

Embedding vectors are widely used for representing unstructured data and searching through it for semantically similar items. However, the large size of these vectors, due to their high-dimensionality, creates problems for modern vector…

Machine Learning · Computer Science 2025-09-24 Mariano Tepper , Ted Willke

Pyramid Vector Quantization for LLMs

Recent works on compression of large language models (LLM) using quantization considered reparameterizing the architecture such that weights are distributed on the sphere. This demonstratively improves the ability to quantize by increasing…

Machine Learning · Computer Science 2024-12-05 Tycho F. A. van der Ouderaa , Maximilian L. Croci , Agrin Hilmkil , James Hensman

Improving Pyramid Vector Quantizer with power projection

Pyramid Vector Quantizer (PVQ) is a promising technique especially for multimedia data compression, already used in Opus audio codec and considered for AV1 video codec. It quantizes vectors from Euclidean unit sphere by first projecting…

Optimization and Control · Mathematics 2017-05-16 Jarek Duda

MVQ:Towards Efficient DNN Compression and Acceleration with Masked Vector Quantization

Vector quantization(VQ) is a hardware-friendly DNN compression method that can reduce the storage cost and weight-loading datawidth of hardware accelerators. However, conventional VQ techniques lead to significant accuracy loss because the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Shuaiting Li , Chengxuan Wang , Juncan Deng , Zeyu Wang , Zewen Ye , Zongsheng Wang , Haibin Shen , Kejie Huang

VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization

Quantization has been proven to be an effective method for reducing the computing and/or storage cost of DNNs. However, the trade-off between the quantization bitwidth and final accuracy is complex and non-convex, which makes it difficult…

Computer Vision and Pattern Recognition · Computer Science 2020-06-11 Cheng Gong , Yao Chen , Ye Lu , Tao Li , Cong Hao , Deming Chen

Iteratively Training Look-Up Tables for Network Quantization

Operating deep neural networks (DNNs) on devices with limited resources requires the reduction of their memory as well as computational footprint. Popular reduction methods are network quantization or pruning, which either reduce the word…

Machine Learning · Computer Science 2023-07-19 Fabien Cardinaux , Stefan Uhlich , Kazuki Yoshiyama , Javier Alonso Garcia , Lukas Mauch , Stephen Tiedemann , Thomas Kemp , Akira Nakamura

Dimension reduction with structure-aware quantum circuits for hybrid machine learning

Schmidt decomposition of a vector can be understood as writing the singular value decomposition (SVD) in vector form. A vector can be written as a linear combination of tensor product of two dimensional vectors by recursively applying…

Quantum Physics · Physics 2025-08-04 Ammar Daskin

Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search

Vector quantization (VQ) techniques are widely used in similarity search for data compression, fast metric computation and etc. Originally designed for Euclidean distance, existing VQ techniques (e.g., PQ, AQ) explicitly or implicitly…

Information Retrieval · Computer Science 2019-11-21 Xinyan Dai , Xiao Yan , Kelvin K. W. Ng , Jie Liu , James Cheng

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings

Vector quantization (VQ) based ANN indexes, such as Inverted File System (IVF) and Product Quantization (PQ), have been widely applied to embedding based document retrieval thanks to the competitive time and memory efficiency. Originally,…

Information Retrieval · Computer Science 2022-04-29 Shitao Xiao , Zheng Liu , Weihao Han , Jianjin Zhang , Defu Lian , Yeyun Gong , Qi Chen , Fan Yang , Hao Sun , Yingxia Shao , Denvy Deng , Qi Zhang , Xing Xie

Efficient VQ-QAT and Mixed Vector/Linear quantized Neural Networks

In this work, we developed and tested 3 techniques for vector quantization (VQ) based model weight compression. To mitigate codebook collapse and enable end-to-end training, we adopted cosine similarity-based assignment. Building on ideas…

Machine Learning · Computer Science 2026-04-28 Terry Gou , Puneet Gupta

Beyond Product Quantization: Deep Progressive Quantization for Image Retrieval

Product Quantization (PQ) has long been a mainstream for generating an exponentially large codebook at very low memory/time cost. Despite its success, PQ is still tricky for the decomposition of high-dimensional vector space, and the…

Computer Vision and Pattern Recognition · Computer Science 2020-12-08 Lianli Gao , Xiaosu Zhu , Jingkuan Song , Zhou Zhao , Heng Tao Shen

Weight Matrix Dimensionality Reduction in Deep Learning via Kronecker Multi-layer Architectures

Deep learning using neural networks is an effective technique for generating models of complex data. However, training such models can be expensive when networks have large model capacity resulting from a large number of layers and nodes.…

Machine Learning · Computer Science 2023-01-19 Jarom D. Hogue , Robert M. Kirby , Akil Narayan

DBQ: A Differentiable Branch Quantizer for Lightweight Deep Neural Networks

Deep neural networks have achieved state-of-the art performance on various computer vision tasks. However, their deployment on resource-constrained devices has been hindered due to their high computational and storage complexity. While…

Computer Vision and Pattern Recognition · Computer Science 2020-07-21 Hassan Dbouk , Hetul Sanghvi , Mahesh Mehendale , Naresh Shanbhag

DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick

Vector quantization is common in deep models, yet its hard assignments block gradients and hinder end-to-end training. We propose DiVeQ, which treats quantization as adding an error vector that mimics the quantization distortion, keeping…

Machine Learning · Computer Science 2026-05-27 Mohammad Hassan Vali , Tom Bäckström , Arno Solin

Weight Normalization based Quantization for Deep Neural Network Compression

With the development of deep neural networks, the size of network models becomes larger and larger. Model compression has become an urgent need for deploying these network models to mobile or embedded devices. Model quantization is a…

Machine Learning · Computer Science 2019-07-02 Wen-Pu Cai , Wu-Jun Li

Learning Low-Rank Representations for Model Compression

Vector Quantization (VQ) is an appealing model compression method to obtain a tiny model with less accuracy loss. While methods to obtain better codebooks and codes under fixed clustering dimensionality have been extensively studied,…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Zezhou Zhu , Yucong Zhou , Zhao Zhong

PQS (Prune, Quantize, and Sort): Low-Bitwidth Accumulation of Dot Products in Neural Network Computations

We present PQS, which uses three techniques together - Prune, Quantize, and Sort - to achieve low-bitwidth accumulation of dot products in neural network computations. In conventional quantized (e.g., 8-bit) dot products, partial results…

Machine Learning · Computer Science 2025-04-15 Vikas Natesh , H. T. Kung

Activation Map-based Vector Quantization for 360-degree Image Semantic Communication

In virtual reality (VR) applications, 360-degree images play a pivotal role in crafting immersive experiences and offering panoramic views, thus improving user Quality of Experience (QoE). However, the voluminous data generated by…

Image and Video Processing · Electrical Eng. & Systems 2024-06-10 Yang Ma , Wenchi Cheng , Jingqing Wang , Wei Zhang