Related papers: Tensorized Embedding Layers for Efficient Model Co…

Partial Tensorized Transformers for Natural Language Processing

The transformer architecture has revolutionized Natural Language Processing (NLP) and other machine-learning tasks, due to its unprecedented accuracy. However, their extensive memory and parameter requirements often hinder their practical…

Computation and Language · Computer Science 2023-11-01 Subhadra Vadlamannati , Ryan Solgi

TensorGPT: Efficient Compression of Large Language Models based on Tensor-Train Decomposition

High-dimensional token embeddings underpin Large Language Models (LLMs), as they can capture subtle semantic information and significantly enhance the modelling of complex language patterns. However, this high dimensionality also introduces…

Computation and Language · Computer Science 2024-10-07 Mingxue Xu , Yao Lei Xu , Danilo P. Mandic

Tensorizing Neural Networks

Deep neural networks currently demonstrate state-of-the-art performance in several domains. At the same time, models of this class are very demanding in terms of computational resources. In particular, a large amount of memory is required…

Machine Learning · Computer Science 2015-12-22 Alexander Novikov , Dmitry Podoprikhin , Anton Osokin , Dmitry Vetrov

Online Embedding Compression for Text Classification using Low Rank Matrix Factorization

Deep learning models have become state of the art for natural language processing (NLP) tasks, however deploying these models in production system poses significant memory constraints. Existing compression methods are either lossy or…

Machine Learning · Computer Science 2018-11-05 Anish Acharya , Rahul Goel , Angeliki Metallinou , Inderjit Dhillon

Parameter-Efficient Transformer Embeddings

Embedding layers in transformer-based NLP models typically account for the largest share of model parameters, scaling with vocabulary size but not yielding performance gains proportional to scale. We propose an alternative approach in which…

Computation and Language · Computer Science 2025-05-06 Henry Ndubuaku , Mouad Talhi

TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices

Small Language Models (SLMs, or on-device LMs) have significantly fewer parameters than Large Language Models (LLMs). They are typically deployed on low-end devices, like mobile phones and single-board computers. Unlike LLMs, which rely on…

Computation and Language · Computer Science 2025-06-17 Mingxue Xu , Yao Lei Xu , Danilo P. Mandic

From Fully Trained to Fully Random Embeddings: Improving Neural Machine Translation with Compact Word Embedding Tables

Embedding matrices are key components in neural natural language processing (NLP) models that are responsible to provide numerical representations of input tokens.\footnote{In this paper words and subwords are referred to as \textit{tokens}…

Computation and Language · Computer Science 2022-04-19 Krtin Kumar , Peyman Passban , Mehdi Rezagholizadeh , Yiu Sing Lau , Qun Liu

Tensor Networks for Big Data Analytics and Large-Scale Optimization Problems

In this paper we review basic and emerging models and associated algorithms for large-scale tensor networks, especially Tensor Train (TT) decompositions using novel mathematical and graphical representations. We discus the concept of…

Numerical Analysis · Computer Science 2014-08-25 Andrzej Cichocki

Tensor-Train Recurrent Neural Networks for Video Classification

The Recurrent Neural Networks and their variants have shown promising performances in sequence modeling tasks such as Natural Language Processing. These models, however, turn out to be impractical and difficult to train when exposed to very…

Computer Vision and Pattern Recognition · Computer Science 2017-07-07 Yinchong Yang , Denis Krompass , Volker Tresp

Using the Output Embedding to Improve Language Models

We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze…

Computation and Language · Computer Science 2017-02-22 Ofir Press , Lior Wolf

Improving Word Embedding Factorization for Compression Using Distilled Nonlinear Neural Decomposition

Word-embeddings are vital components of Natural Language Processing (NLP) models and have been extensively explored. However, they consume a lot of memory which poses a challenge for edge deployment. Embedding matrices, typically, contain…

Computation and Language · Computer Science 2020-11-12 Vasileios Lioutas , Ahmad Rashid , Krtin Kumar , Md Akmal Haidar , Mehdi Rezagholizadeh

Direction is what you need: Improving Word Embedding Compression in Large Language Models

The adoption of Transformer-based models in natural language processing (NLP) has led to great success using a massive number of parameters. However, due to deployment constraints in edge devices, there has been a rising interest in the…

Computation and Language · Computer Science 2021-08-04 Klaudia Bałazy , Mohammadreza Banaei , Rémi Lebret , Jacek Tabor , Karl Aberer

Compressing Neural Language Models by Sparse Word Representations

Neural networks are among the state-of-the-art techniques for language modeling. Existing neural language models typically map discrete words to distributed, dense vector representations. After information processing of the preceding…

Computation and Language · Computer Science 2016-10-14 Yunchuan Chen , Lili Mou , Yan Xu , Ge Li , Zhi Jin

TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models

The memory capacity of embedding tables in deep learning recommendation models (DLRMs) is increasing dramatically from tens of GBs to TBs across the industry. Given the fast growth in DLRMs, novel solutions are urgently needed, in order to…

Machine Learning · Computer Science 2021-01-29 Chunxing Yin , Bilge Acun , Xing Liu , Carole-Jean Wu

Compressing Word Embeddings via Deep Compositional Code Learning

Natural language processing (NLP) models often require a massive number of parameters for word embeddings, resulting in a large storage or memory footprint. Deploying neural NLP models to mobile devices requires compressing the word…

Computation and Language · Computer Science 2017-11-20 Raphael Shu , Hideki Nakayama

Neural Networks Compression for Language Modeling

In this paper, we consider several compression techniques for the language modeling problem based on recurrent neural networks (RNNs). It is known that conventional RNNs, e.g, LSTM-based networks in language modeling, are characterized with…

Machine Learning · Statistics 2019-04-09 Artem M. Grachev , Dmitry I. Ignatov , Andrey V. Savchenko

Semantic Vector Machines

We first present our work in machine translation, during which we used aligned sentences to train a neural network to embed n-grams of different languages into an $d$-dimensional space, such that n-grams that are the translation of each…

Machine Learning · Computer Science 2011-05-17 Etter Vincent

Tensor Decomposition for Model Reduction in Neural Networks: A Review

Modern neural networks have revolutionized the fields of computer vision (CV) and Natural Language Processing (NLP). They are widely used for solving complex CV tasks and NLP tasks such as image classification, image generation, and machine…

Machine Learning · Computer Science 2023-07-25 Xingyi Liu , Keshab K. Parhi

Compression of Recurrent Neural Networks for Efficient Language Modeling

Recurrent neural networks have proved to be an effective method for statistical language modeling. However, in practice their memory and run-time complexity are usually too large to be implemented in real-time offline mobile applications.…

Computation and Language · Computer Science 2019-04-09 Artem M. Grachev , Dmitry I. Ignatov , Andrey V. Savchenko

Tensorization is a powerful but underexplored tool for compression and interpretability of neural networks

Tensorizing a neural network involves reshaping some or all of its dense weight matrices into higher-order tensors and approximating them using low-rank tensor network decompositions. This technique has shown promise as a model compression…

Machine Learning · Computer Science 2025-05-27 Safa Hamreras , Sukhbinder Singh , Román Orús