English
Related papers

Related papers: Structured Multidimensional Representation Learnin…

200 papers

Embedding layers in transformer-based NLP models typically account for the largest share of model parameters, scaling with vocabulary size but not yielding performance gains proportional to scale. We propose an alternative approach in which…

Computation and Language · Computer Science 2025-05-06 Henry Ndubuaku , Mouad Talhi

Transformer-based end-to-end speech recognition has achieved great success. However, the large footprint and computational overhead make it difficult to deploy these models in some real-world applications. Model compression techniques can…

Computation and Language · Computer Science 2023-03-15 Yifan Peng , Jaesong Lee , Shinji Watanabe

The embedding layers transforming input words into real vectors are the key components of deep neural networks used in natural language processing. However, when the vocabulary is large, the corresponding weight matrices can be enormous,…

Computation and Language · Computer Science 2020-02-20 Oleksii Hrinchuk , Valentin Khrulkov , Leyla Mirvakhabova , Elena Orlova , Ivan Oseledets

High-dimensional token embeddings underpin Large Language Models (LLMs), as they can capture subtle semantic information and significantly enhance the modelling of complex language patterns. However, this high dimensionality also introduces…

Computation and Language · Computer Science 2024-10-07 Mingxue Xu , Yao Lei Xu , Danilo P. Mandic

The question of what kinds of linguistic information are encoded in different layers of Transformer-based language models is of considerable interest for the NLP community. Existing work, however, has overwhelmingly focused on word-level…

Computation and Language · Computer Science 2023-10-19 Dmitry Nikolaev , Sebastian Padó

Tensor decomposition of high-dimensional data often struggles to capture semantically or physically meaningful structures, particularly when relying on reconstruction objectives and fixed-rank constraints. We introduce a no-rank tensor…

Machine Learning · Computer Science 2026-03-03 Maryam Bagherian

Transformer-based self-supervised models are trained as feature extractors and have empowered many downstream speech tasks to achieve state-of-the-art performance. However, both the training and inference process of these models may…

Computation and Language · Computer Science 2021-05-04 Jinchuan Tian , Rongzhi Gu , Helin Wang , Yuexian Zou

Learning systems often expand their ambient features or latent representations over time, embedding earlier representations into larger spaces with limited new latent structure. We study transfer learning for structured matrix estimation…

Machine Learning · Computer Science 2026-01-30 Jinhang Chai , Xuyuan Liu , Elynn Chen , Yujun Yan

Learned Sparse Retrieval (LSR) has traditionally focused on small-scale encoder-only transformer architectures. With the advent of large-scale pre-trained language models, their capability to generate sparse representations for retrieval…

Information Retrieval · Computer Science 2025-04-28 Jingfen Qiao , Thong Nguyen , Evangelos Kanoulas , Andrew Yates

We consider the task of building compact deep learning pipelines suitable for deployment on storage and power constrained mobile devices. We propose a unified framework to learn a broad family of structured parameter matrices that are…

Machine Learning · Statistics 2015-10-07 Vikas Sindhwani , Tara N. Sainath , Sanjiv Kumar

How related are the representations learned by neural language models, translation models, and language tagging tasks? We answer this question by adapting an encoder-decoder transfer learning method from computer vision to investigate the…

Computation and Language · Computer Science 2025-12-11 Richard Antonello , Javier Turek , Vy Vo , Alexander Huth

Traditional sequential recommendation (SR) models learn low-dimensional item ID embeddings from user-item interactions, often overlooking textual information such as item titles or descriptions. Recent advances in Large Language Models…

Information Retrieval · Computer Science 2026-04-27 Yu Cui , Feng Liu , Zhaoxiang Wang , Changwang Zhang , Jun Wang , Can Wang , Jiawei Chen

Multivariate time series forecasting requires models to simultaneously capture variable-wise structural dependencies and generalize across diverse tasks. While structural encoders are effective in modeling feature interactions, they lack…

Computation and Language · Computer Science 2025-06-26 Fengze Li , Yue Wang , Yangle Liu , Ming Huang , Dou Hong , Jieming Ma

Transformer has achieved competitive performance against state-of-the-art end-to-end models in automatic speech recognition (ASR), and requires significantly less training time than RNN-based models. The original Transformer, with…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-14 Wenyong Huang , Wenchao Hu , Yu Ting Yeung , Xiao Chen

While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on…

Computation and Language · Computer Science 2020-03-25 Alex Bie , Bharat Venkitesh , Joao Monteiro , Md. Akmal Haidar , Mehdi Rezagholizadeh

Small Language Models (SLMs, or on-device LMs) have significantly fewer parameters than Large Language Models (LLMs). They are typically deployed on low-end devices, like mobile phones and single-board computers. Unlike LLMs, which rely on…

Computation and Language · Computer Science 2025-06-17 Mingxue Xu , Yao Lei Xu , Danilo P. Mandic

Pre-trained Transformer language models (LM) have become go-to text representation encoders. Prior research fine-tunes deep LMs to encode text sequences such as sentences and passages into single dense vector representations for efficient…

Computation and Language · Computer Science 2021-09-22 Luyu Gao , Jamie Callan

Tensor train (TT) decomposition is a powerful representation for high-order tensors, which has been successfully applied to various machine learning tasks in recent years. However, since the tensor product is not commutative, permutation of…

Numerical Analysis · Computer Science 2017-05-31 Qibin Zhao , Masashi Sugiyama , Andrzej Cichocki

Automatic speech recognition (ASR) systems developed in recent years have shown promising results with self-attention models (e.g., Transformer and Conformer), which are replacing conventional recurrent neural networks. Meanwhile, a…

Sound · Computer Science 2022-11-01 Koichi Miyazaki , Masato Murata , Tomoki Koriyama

This paper proposes Transducers with Pronunciation-aware Embeddings (PET). Unlike conventional Transducers where the decoder embeddings for different tokens are trained independently, the PET model's decoder embedding incorporates shared…

Computation and Language · Computer Science 2024-04-09 Hainan Xu , Zhehuai Chen , Fei Jia , Boris Ginsburg
‹ Prev 1 2 3 10 Next ›