Related papers: Embedding Compression with Isotropic Iterative Qua…

Learning to Remove: Towards Isotropic Pre-trained BERT Embedding

Pre-trained language models such as BERT have become a more common choice of natural language processing (NLP) tasks. Research in word representation shows that isotropic embeddings can significantly improve performance on downstream tasks.…

Computation and Language · Computer Science 2021-08-30 Yuxin Liang , Rui Cao , Jie Zheng , Jie Ren , Ling Gao

Retraining-Based Iterative Weight Quantization for Deep Neural Networks

Model compression has gained a lot of attention due to its ability to reduce hardware resource requirements significantly while maintaining accuracy of DNNs. Model compression is especially useful for memory-intensive recurrent neural…

Machine Learning · Computer Science 2018-05-30 Dongsoo Lee , Byeongwook Kim

Online Embedding Compression for Text Classification using Low Rank Matrix Factorization

Deep learning models have become state of the art for natural language processing (NLP) tasks, however deploying these models in production system poses significant memory constraints. Existing compression methods are either lossy or…

Machine Learning · Computer Science 2018-11-05 Anish Acharya , Rahul Goel , Angeliki Metallinou , Inderjit Dhillon

Differentiable Product Quantization for End-to-End Embedding Compression

Embedding layers are commonly used to map discrete symbols into continuous embedding vectors that reflect their semantic meanings. Despite their effectiveness, the number of parameters in an embedding layer increases linearly with the…

Machine Learning · Computer Science 2020-06-29 Ting Chen , Lala Li , Yizhou Sun

Compressing Word Embeddings via Deep Compositional Code Learning

Natural language processing (NLP) models often require a massive number of parameters for word embeddings, resulting in a large storage or memory footprint. Deploying neural NLP models to mobile devices requires compressing the word…

Computation and Language · Computer Science 2017-11-20 Raphael Shu , Hideki Nakayama

Improving Word Embedding Factorization for Compression Using Distilled Nonlinear Neural Decomposition

Word-embeddings are vital components of Natural Language Processing (NLP) models and have been extensively explored. However, they consume a lot of memory which poses a challenge for edge deployment. Embedding matrices, typically, contain…

Computation and Language · Computer Science 2020-11-12 Vasileios Lioutas , Ahmad Rashid , Krtin Kumar , Md Akmal Haidar , Mehdi Rezagholizadeh

Learning Compressed Sentence Representations for On-Device Text Processing

Vector representations of sentences, trained on massive text corpora, are widely used as generic sentence embeddings across a variety of NLP problems. The learned representations are generally assumed to be continuous and real-valued,…

Computation and Language · Computer Science 2019-06-21 Dinghan Shen , Pengyu Cheng , Dhanasekar Sundararaman , Xinyuan Zhang , Qian Yang , Meng Tang , Asli Celikyilmaz , Lawrence Carin

Partitioning-Guided K-Means: Extreme Empty Cluster Resolution for Extreme Model Compression

Compactness in deep learning can be critical to a model's viability in low-resource applications, and a common approach to extreme model compression is quantization. We consider Iterative Product Quantization (iPQ) with Quant-Noise to be…

Machine Learning · Computer Science 2023-06-27 Tianhong Huang , Victor Agostinelli , Lizhong Chen

word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement

Deep learning natural language processing models often use vector word embeddings, such as word2vec or GloVe, to represent words. A discrete sequence of words can be much more easily integrated with downstream neural layers if it is…

Machine Learning · Computer Science 2020-03-04 Aliakbar Panahi , Seyran Saeedi , Tom Arodz

Simultaneous Compression and Quantization: A Joint Approach for Efficient Unsupervised Hashing

For unsupervised data-dependent hashing, the two most important requirements are to preserve similarity in the low-dimensional feature space and to minimize the binary quantization loss. A well-established hashing approach is Iterative…

Computer Vision and Pattern Recognition · Computer Science 2019-11-14 Tuan Hoang , Thanh-Toan Do , Huu Le , Dang-Khoa Le-Tan , Ngai-Man Cheung

Near-lossless Binarization of Word Embeddings

Word embeddings are commonly used as a starting point in many NLP models to achieve state-of-the-art performances. However, with a large vocabulary and many dimensions, these floating-point representations are expensive both in terms of…

Computation and Language · Computer Science 2020-01-23 Julien Tissier , Christophe Gravier , Amaury Habrard

Model compression as constrained optimization, with application to neural nets. Part II: quantization

We consider the problem of deep neural net compression by quantization: given a large, reference net, we want to quantize its real-valued weights using a codebook with $K$ entries so that the training loss of the quantized net is minimal.…

Machine Learning · Computer Science 2017-07-17 Miguel Á. Carreira-Perpiñán , Yerlan Idelbayev

Scalable Image Tokenization with Index Backpropagation Quantization

Existing vector quantization (VQ) methods struggle with scalability, largely attributed to the instability of the codebook that undergoes partial updates during training. The codebook is prone to collapse as utilization decreases, due to…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Fengyuan Shi , Zhuoyan Luo , Yixiao Ge , Yujiu Yang , Ying Shan , Limin Wang

Binary Quadratic Quantization: Beyond First-Order Quantization for Real-Valued Matrix Compression

This paper proposes a novel matrix quantization method, Binary Quadratic Quantization (BQQ). In contrast to conventional first-order quantization approaches, such as uniform quantization and binary coding quantization, that approximate…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Kyo Kuroki , Yasuyuki Okoshi , Thiem Van Chu , Kazushi Kawamura , Masato Motomura

Using Holographically Compressed Embeddings in Question Answering

Word vector representations are central to deep learning natural language processing models. Many forms of these vectors, known as embeddings, exist, including word2vec and GloVe. Embeddings are trained on large corpora and learn the word's…

Computation and Language · Computer Science 2020-07-16 Salvador E. Barbosa

Tensorized Embedding Layers for Efficient Model Compression

The embedding layers transforming input words into real vectors are the key components of deep neural networks used in natural language processing. However, when the vocabulary is large, the corresponding weight matrices can be enormous,…

Computation and Language · Computer Science 2020-02-20 Oleksii Hrinchuk , Valentin Khrulkov , Leyla Mirvakhabova , Elena Orlova , Ivan Oseledets

Learning Complex Word Embeddings in Classical and Quantum Spaces

We present a variety of methods for training complex-valued word embeddings, based on the classical Skip-gram model, with a straightforward adaptation simply replacing the real-valued vectors with arbitrary vectors of complex numbers. In a…

Computation and Language · Computer Science 2024-12-19 Carys Harvey , Stephen Clark , Douglas Brown , Konstantinos Meichanetzidis

Variational Bayesian Quantization

We propose a novel algorithm for quantizing continuous latent representations in trained models. Our approach applies to deep probabilistic models, such as variational autoencoders (VAEs), and enables both data and model compression. Unlike…

Image and Video Processing · Electrical Eng. & Systems 2020-09-09 Yibo Yang , Robert Bamler , Stephan Mandt

Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs

Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs. For example, an AI writing assistant is required to update its suggestions in real time as a document is edited.…

Machine Learning · Computer Science 2023-07-28 Or Sharir , Anima Anandkumar

IB-DRR: Incremental Learning with Information-Back Discrete Representation Replay

Incremental learning aims to enable machine learning models to continuously acquire new knowledge given new classes, while maintaining the knowledge already learned for old classes. Saving a subset of training samples of previously seen…

Computer Vision and Pattern Recognition · Computer Science 2021-04-22 Jian Jiang , Edoardo Cetin , Oya Celiktutan