Related papers: Efficient Contextual Representation Learning Witho…

Efficient softmax approximation for GPUs

We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the…

Computation and Language · Computer Science 2017-06-20 Edouard Grave , Armand Joulin , Moustapha Cissé , David Grangier , Hervé Jégou

Robust Document Representations using Latent Topics and Metadata

Task specific fine-tuning of a pre-trained neural language model using a custom softmax output layer is the de facto approach of late when dealing with document classification problems. This technique is not adequate when labeled examples…

Computation and Language · Computer Science 2020-10-27 Natraj Raman , Armineh Nourbakhsh , Sameena Shah , Manuela Veloso

Retrofitting Contextualized Word Embeddings with Paraphrases

Contextualized word embedding models, such as ELMo, generate meaningful representations of words and their context. These models have been shown to have a great impact on downstream applications. However, in many cases, the contextualized…

Computation and Language · Computer Science 2019-09-27 Weijia Shi , Muhao Chen , Pei Zhou , Kai-Wei Chang

Subword ELMo

Embedding from Language Models (ELMo) has shown to be effective for improving many natural language processing (NLP) tasks, and ELMo takes character information to compose word representation to train language models.However, the character…

Computation and Language · Computer Science 2019-09-19 Jiangtong Li , Hai Zhao , Zuchao Li , Wei Bi , Xiaojiang Liu

Linguistic Knowledge and Transferability of Contextual Representations

Contextual word representations derived from large-scale neural language models are successful across a diverse set of NLP tasks, suggesting that they encode useful and transferable features of language. To shed light on the linguistic…

Computation and Language · Computer Science 2019-04-29 Nelson F. Liu , Matt Gardner , Yonatan Belinkov , Matthew E. Peters , Noah A. Smith

End-to-End Speech Recognition Contextualization with Large Language Models

In recent years, Large Language Models (LLMs) have garnered significant attention from the research community due to their exceptional performance and generalization capabilities. In this paper, we introduce a novel method for…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-21 Egor Lakomkin , Chunyang Wu , Yassir Fathullah , Ozlem Kalinli , Michael L. Seltzer , Christian Fuegen

Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

Large Language Models (LLMs) have garnered widespread attention due to their remarkable performance across various tasks. However, to mitigate the issue of hallucinations, LLMs often incorporate retrieval-augmented pipeline to provide them…

Computation and Language · Computer Science 2024-08-29 Haowen Hou , Fei Ma , Binwen Bai , Xinxin Zhu , Fei Yu

Escaping the Context Bottleneck: Active Context Curation for LLM Agents via Reinforcement Learning

Large Language Models (LLMs) struggle with long-horizon tasks due to the "context bottleneck" and the "lost-in-the-middle" phenomenon, where accumulated noise from verbose environments degrades reasoning over multi-turn interactions. To…

Artificial Intelligence · Computer Science 2026-04-14 Xiaozhe Li , Tianyi Lyu , Yizhao Yang , Liang Shan , Siyi Yang , Ligao Zhang , Zhuoyi Huang , Qingwen Liu , Yang Li

Deep Learning using Linear Support Vector Machines

Recently, fully-connected and convolutional neural networks have been trained to achieve state-of-the-art performance on a wide variety of tasks such as speech recognition, image classification, natural language processing, and…

Machine Learning · Computer Science 2015-02-24 Yichuan Tang

Parameter Efficient Multimodal Transformers for Video Representation Learning

The recent success of Transformers in the language domain has motivated adapting it to a multimodal setting, where a new visual model is trained in tandem with an already pretrained language model. However, due to the excessive memory…

Computer Vision and Pattern Recognition · Computer Science 2021-09-23 Sangho Lee , Youngjae Yu , Gunhee Kim , Thomas Breuel , Jan Kautz , Yale Song

Utilizing Multilingual Encoders to Improve Large Language Models for Low-Resource Languages

Large Language Models (LLMs) excel in English, but their performance degrades significantly on low-resource languages (LRLs) due to English-centric training. While methods like LangBridge align LLMs with multilingual encoders such as the…

Computation and Language · Computer Science 2025-11-11 Imalsha Puranegedara , Themira Chathumina , Nisal Ranathunga , Nisansa de Silva , Surangika Ranathunga , Mokanarangan Thayaparan

A Survey on Contextual Embeddings

Contextual embeddings, such as ELMo and BERT, move beyond global word representations like Word2Vec and achieve ground-breaking performance on a wide range of natural language processing tasks. Contextual embeddings assign each word a…

Computation and Language · Computer Science 2020-04-14 Qi Liu , Matt J. Kusner , Phil Blunsom

Layer by Layer: Uncovering Hidden Representations in Language Models

From extracting features to generating text, the outputs of large language models (LLMs) typically rely on the final layers, following the conventional wisdom that earlier layers capture only low-level cues. However, our analysis shows that…

Machine Learning · Computer Science 2025-06-17 Oscar Skean , Md Rifat Arefin , Dan Zhao , Niket Patel , Jalal Naghiyev , Yann LeCun , Ravid Shwartz-Ziv

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. Existing long-context extension methods usually need additional training procedures to support…

Computation and Language · Computer Science 2024-02-23 Jiaheng Liu , Zhiqi Bai , Yuanxing Zhang , Chenchen Zhang , Yu Zhang , Ge Zhang , Jiakai Wang , Haoran Que , Yukang Chen , Wenbo Su , Tiezheng Ge , Jie Fu , Wenhu Chen , Bo Zheng

An Effective Training Framework for Light-Weight Automatic Speech Recognition Models

Recent advancement in deep learning encouraged developing large automatic speech recognition (ASR) models that achieve promising results while ignoring computational and memory constraints. However, deploying such models on low resource…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Abdul Hannan , Alessio Brutti , Shah Nawaz , Mubashir Noman

Contextual Representation Learning beyond Masked Language Modeling

How do masked language models (MLMs) such as BERT learn contextual representations? In this work, we analyze the learning dynamics of MLMs. We find that MLMs adopt sampled embeddings as anchors to estimate and inject contextual semantics to…

Computation and Language · Computer Science 2022-04-11 Zhiyi Fu , Wangchunshu Zhou , Jingjing Xu , Hao Zhou , Lei Li

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

Recent advances in language modeling consist in pretraining highly parameterized neural networks on extremely large web-mined text corpora. Training and inference with such models can be costly in practice, which incentivizes the use of…

Computation and Language · Computer Science 2024-04-12 Nathan Godey , Éric de la Clergerie , Benoît Sagot

Improving Matching Models with Hierarchical Contextualized Representations for Multi-turn Response Selection

In this paper, we study context-response matching with pre-trained contextualized representations for multi-turn response selection in retrieval-based chatbots. Existing models, such as Cove and ELMo, are trained with limited context (often…

Computation and Language · Computer Science 2019-06-05 Chongyang Tao , Wei Wu , Can Xu , Yansong Feng , Dongyan Zhao , Rui Yan

SCELMo: Source Code Embeddings from Language Models

Continuous embeddings of tokens in computer programs have been used to support a variety of software development tools, including readability, code search, and program repair. Contextual embeddings are common in natural language processing…

Software Engineering · Computer Science 2020-04-29 Rafael - Michael Karampatsis , Charles Sutton

High Quality ELMo Embeddings for Seven Less-Resourced Languages

Recent results show that deep neural networks using contextual embeddings significantly outperform non-contextual embeddings on a majority of text classification task. We offer precomputed embeddings from popular contextual ELMo model for…

Computation and Language · Computer Science 2022-06-01 Matej Ulčar , Marko Robnik-Šikonja