Related papers: Accelerating Large-Scale Inference with Anisotropi…

Quantization based Fast Inner Product Search

We propose a quantization based approach for fast approximate Maximum Inner Product Search (MIPS). Each database vector is quantized in multiple subspaces via a set of codebooks, learned directly by minimizing the inner product quantization…

Artificial Intelligence · Computer Science 2015-09-07 Ruiqi Guo , Sanjiv Kumar , Krzysztof Choromanski , David Simcha

Quantization for Vector Search under Streaming Updates

Large-scale vector databases for approximate nearest neighbor (ANN) search typically store a quantized dataset in main memory for fast access, and full precision data on remote disk. State-of-the-art ANN quantization methods are highly…

Data Structures and Algorithms · Computer Science 2025-12-23 Ishaq Aden-Ali , Hakan Ferhatosmanoglu , Alexander Greaves-Tunnell , Nina Mishra , Tal Wagner

Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines

Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only…

Machine Learning · Computer Science 2018-05-22 Sean O. Settle , Manasa Bollavaram , Paolo D'Alberto , Elliott Delaye , Oscar Fernandez , Nicholas Fraser , Aaron Ng , Ashish Sirasao , Michael Wu

Ultra-Quantisation: Efficient Embedding Search via 1.58-bit Encodings

Many modern search domains comprise high-dimensional vectors of floating point numbers derived from neural networks, in the form of embeddings. Typical embeddings range in size from hundreds to thousands of dimensions, making the size of…

Machine Learning · Computer Science 2025-06-03 Richard Connor , Alan Dearle , Ben Claydon

Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation

Quantization techniques can reduce the size of Deep Neural Networks and improve inference latency and throughput by taking advantage of high throughput integer instructions. In this paper we review the mathematical aspects of quantization…

Machine Learning · Computer Science 2020-04-22 Hao Wu , Patrick Judd , Xiaojie Zhang , Mikhail Isaev , Paulius Micikevicius

Mixed-Precision Inference Quantization: Radically Towards Faster inference speed, Lower Storage requirement, and Lower Loss

Based on the model's resilience to computational noise, model quantization is important for compressing models and improving computing speed. Existing quantization techniques rely heavily on experience and "fine-tuning" skills. In the…

Machine Learning · Computer Science 2022-07-22 Daning Cheng , Wenguang Chen

Multi-Scale Vector Quantization with Reconstruction Trees

We propose and study a multi-scale approach to vector quantization. We develop an algorithm, dubbed reconstruction trees, inspired by decision trees. Here the objective is parsimonious reconstruction of unsupervised data, rather than…

Machine Learning · Computer Science 2019-09-05 Enrico Cecini , Ernesto De Vito , Lorenzo Rosasco

QuantAttack: Exploiting Dynamic Quantization to Attack Vision Transformers

In recent years, there has been a significant trend in deep neural networks (DNNs), particularly transformer-based models, of developing ever-larger and more capable models. While they demonstrate state-of-the-art performance, their growing…

Computer Vision and Pattern Recognition · Computer Science 2024-12-02 Amit Baras , Alon Zolfi , Yuval Elovici , Asaf Shabtai

Approximation by Quantization

Inference in graphical models consists of repeatedly multiplying and summing out potentials. It is generally intractable because the derived potentials obtained in this way can be exponentially large. Approximate inference techniques such…

Artificial Intelligence · Computer Science 2012-02-20 Vibhav Gogate , Pedro Domingos

Supervised Quantization for Similarity Search

In this paper, we address the problem of searching for semantically similar images from a large database. We present a compact coding approach, supervised quantization. Our approach simultaneously learns feature selection that linearly…

Computer Vision and Pattern Recognition · Computer Science 2019-02-05 Xiaojuan Wang , Ting Zhang , Guo-Jun Q , Jinhui Tang , Jingdong Wang

Approximate search with quantized sparse representations

This paper tackles the task of storing a large collection of vectors, such as visual descriptors, and of searching in it. To this end, we propose to approximate database vectors by constrained sparse coding, where possible atom weights are…

Computer Vision and Pattern Recognition · Computer Science 2016-08-12 Himalaya Jain , Patrick Pérez , Rémi Gribonval , Joaquin Zepeda , Hervé Jégou

Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search

Approximate nearest neighbor (ANN) query in high-dimensional Euclidean space is a key operator in database systems. For this query, quantization is a popular family of methods developed for compressing vectors and reducing memory…

Databases · Computer Science 2024-09-17 Jianyang Gao , Yutong Gou , Yuexuan Xu , Yongyi Yang , Cheng Long , Raymond Chi-Wing Wong

Regularized Classification-Aware Quantization

Traditionally, quantization is designed to minimize the reconstruction error of a data source. When considering downstream classification tasks, other measures of distortion can be of interest; such as the 0-1 classification loss.…

Machine Learning · Computer Science 2021-07-22 Daniel Severo , Elad Domanovitz , Ashish Khisti

A Comprehensive Study on Quantization Techniques for Large Language Models

Large Language Models (LLMs) have been extensively researched and used in both academia and industry since the rise in popularity of the Transformer model, which demonstrates excellent performance in AI. However, the computational demands…

Machine Learning · Computer Science 2024-11-06 Jiedong Lang , Zhehao Guo , Shuyu Huang

Accelerating Neural Network Inference by Overflow Aware Quantization

The inherent heavy computation of deep neural networks prevents their widespread applications. A widely used method for accelerating model inference is quantization, by replacing the input operands of a network using fixed-point values.…

Computer Vision and Pattern Recognition · Computer Science 2020-05-28 Hongwei Xie , Shuo Zhang , Huanghao Ding , Yafei Song , Baitao Shao , Conggang Hu , Ling Cai , Mingyang Li

Quantixar: High-performance Vector Data Management System

Traditional database management systems need help efficiently represent and querying the complex, high-dimensional data prevalent in modern applications. Vector databases offer a solution by storing data as numerical vectors within a…

Databases · Computer Science 2024-03-20 Gulshan Yadav , RahulKumar Yadav , Mansi Viramgama , Mayank Viramgama , Apeksha Mohite

Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search

Vector quantization (VQ) techniques are widely used in similarity search for data compression, fast metric computation and etc. Originally designed for Euclidean distance, existing VQ techniques (e.g., PQ, AQ) explicitly or implicitly…

Information Retrieval · Computer Science 2019-11-21 Xinyan Dai , Xiao Yan , Kelvin K. W. Ng , Jie Liu , James Cheng

Optimal and Near-Optimal Adaptive Vector Quantization

Quantization is a fundamental optimization for many machine-learning use cases, including compressing gradients, model weights and activations, and datasets. The most accurate form of quantization is \emph{adaptive}, where the error is…

Machine Learning · Computer Science 2025-08-01 Ran Ben-Basat , Yaniv Ben-Itzhak , Michael Mitzenmacher , Shay Vargaftik

Towards Efficient Verification of Quantized Neural Networks

Quantization replaces floating point arithmetic with integer arithmetic in deep neural network models, providing more efficient on-device inference with less power and memory. In this work, we propose a framework for formally verifying…

Machine Learning · Computer Science 2023-12-29 Pei Huang , Haoze Wu , Yuting Yang , Ieva Daukantas , Min Wu , Yedi Zhang , Clark Barrett

Enhancing Vector Quantization with Distributional Matching: A Theoretical and Empirical Study

The success of autoregressive models largely depends on the effectiveness of vector quantization, a technique that discretizes continuous features by mapping them to the nearest code vectors within a learnable codebook. Two critical issues…

Computer Vision and Pattern Recognition · Computer Science 2025-06-19 Xianghong Fang , Litao Guo , Hengchao Chen , Yuxuan Zhang , XiaofanXia , Dingjie Song , Yexin Liu , Hao Wang , Harry Yang , Yuan Yuan , Qiang Sun