English
Related papers

Related papers: Context-level Language Modeling by Learning Predic…

200 papers

Large language models (LLMs) have been widely employed across various application domains, yet their black-box nature poses significant challenges to understanding how these models process input data internally to make predictions. In this…

Machine Learning · Computer Science 2025-09-03 Hangfeng He , Weijie J. Su

Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining…

Understanding context is key to understanding human language, an ability which Large Language Models (LLMs) have been increasingly seen to demonstrate to an impressive extent. However, though the evaluation of LLMs encompasses various…

Computation and Language · Computer Science 2024-02-02 Yilun Zhu , Joel Ruben Antony Moniz , Shruti Bhargava , Jiarui Lu , Dhivya Piraviperumal , Site Li , Yuan Zhang , Hong Yu , Bo-Hsiang Tseng

Large pre-training language models (PLMs) have shown promising in-context learning abilities. However, due to the backbone transformer architecture, existing PLMs are bottlenecked by the memory and computational cost when scaling up to a…

Computation and Language · Computer Science 2023-02-13 Mukai Li , Shansan Gong , Jiangtao Feng , Yiheng Xu , Jun Zhang , Zhiyong Wu , Lingpeng Kong

Scaling large language models (LLMs) leads to an emergent capacity to learn in-context from example demonstrations. Despite progress, theoretical understanding of this phenomenon remains limited. We argue that in-context learning relies on…

Computation and Language · Computer Science 2023-03-15 Michael Hahn , Navin Goyal

Large language models (LLMs) exhibit a strong capacity for in-context learning: Given labeled examples, they can generate good predictions without parameter updates. However, many interactive settings go beyond static prediction to online…

Machine Learning · Computer Science 2026-05-12 Emile Anand , Abdullah Ateyeh , Xinyuan Cao , Max Dabagia

While modern Transformer-based language models (LMs) have achieved major success in multi-task generalization, they often struggle to capture long-range dependencies within their context window. This work introduces a novel approach using…

Computation and Language · Computer Science 2025-09-23 Alok N. Shah , Khush Gupta , Keshav Ramji , Pratik Chaudhari

Text representation plays a critical role in tasks like clustering, retrieval, and other downstream applications. With the emergence of large language models (LLMs), there is increasing interest in harnessing their capabilities for this…

Computation and Language · Computer Science 2025-12-25 Yeqin Zhang , Yizheng Zhao , Chen Hu , Binxing Jiao , Daxin Jiang , Ruihang Miao , Cam-Tu Nguyen

Many applications of large language models (LLMs) require long-context understanding, but models continue to struggle with such tasks. We hypothesize that conventional next-token prediction training could contribute to this, because each…

Computation and Language · Computer Science 2025-03-13 Falko Helm , Nico Daheim , Iryna Gurevych

In-context learning, where pre-trained language models learn to perform tasks from task examples and instructions in their contexts, has attracted much attention in the NLP community. However, the ability of in-context learning is not fully…

Computation and Language · Computer Science 2023-05-17 Yuxian Gu , Li Dong , Furu Wei , Minlie Huang

Autoregressive language models (LMs) generate one token at a time, yet human reasoning operates over higher-level abstractions - sentences, propositions, and concepts. This contrast raises a central question- Can LMs likewise learn to…

Computation and Language · Computer Science 2025-10-14 Hyeonbin Hwang , Byeongguk Jeon , Seungone Kim , Jiyeon Kim , Hoyeon Chang , Sohee Yang , Seungpil Won , Dohaeng Lee , Youbin Ahn , Minjoon Seo

Common language models typically predict the next word given the context. In this work, we propose a method that improves language modeling by learning to align the given context and the following phrase. The model does not require any…

Computation and Language · Computer Science 2019-06-06 Hongyin Luo , Lan Jiang , Yonatan Belinkov , James Glass

In recent years, Large Language Models (LLMs) have garnered significant attention from the research community due to their exceptional performance and generalization capabilities. In this paper, we introduce a novel method for…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-21 Egor Lakomkin , Chunyang Wu , Yassir Fathullah , Ozlem Kalinli , Michael L. Seltzer , Christian Fuegen

Recently, pretrained language models (PLMs) have had exceptional success in language generation. To leverage the rich knowledge encoded by PLMs, a simple yet powerful paradigm is to use prompts in the form of either discrete tokens or…

Computation and Language · Computer Science 2022-10-04 Tianyi Tang , Junyi Li , Wayne Xin Zhao , Ji-Rong Wen

Large language models (LMs) such as GPT-3 have the surprising ability to do in-context learning, where the model learns to do a downstream task simply by conditioning on a prompt consisting of input-output examples. The LM learns from these…

Computation and Language · Computer Science 2022-07-22 Sang Michael Xie , Aditi Raghunathan , Percy Liang , Tengyu Ma

Document-level machine translation manages to outperform sentence level models by a small margin, but have failed to be widely adopted. We argue that previous research did not make a clear use of the global context, and propose a new…

Computation and Language · Computer Science 2020-09-10 Zaixiang Zheng , Xiang Yue , Shujian Huang , Jiajun Chen , Alexandra Birch

This thesis investigates two key phenomena in large language models (LLMs): in-context learning (ICL) and model collapse. We study ICL in a linear transformer with tied weights trained on linear regression tasks, and show that minimising…

Artificial Intelligence · Computer Science 2026-01-06 Josef Ott

We propose a new neurally-inspired model that can learn to encode the global relationship context of visual events across time and space and to use the contextual information to modulate the analysis by synthesis process in a predictive…

Machine Learning · Computer Science 2015-04-17 Mingmin Zhao , Chengxu Zhuang , Yizhou Wang , Tai Sing Lee

This paper explores how large language models can leverage multi-level contextual information to predict group coordination patterns in collaborative mixed reality environments. We demonstrate that encoding individual behavioral profiles,…

Human-Computer Interaction · Computer Science 2025-11-19 Diana Romero , Xin Gao , Daniel Khalkhali , Salma Elmalaki

Prompting Large Language Models (LLMs), or providing context on the expected model of operation, is an effective way to steer the outputs of such models to satisfy human desiderata after they have been trained. But in rapidly evolving…

Machine Learning · Computer Science 2025-08-08 Younwoo Choi , Muhammad Adil Asif , Ziwen Han , John Willes , Rahul G. Krishnan
‹ Prev 1 2 3 10 Next ›