Related papers: Language Models Implement Simple Word2Vec-style Ve…

Limitations of Language Models in Arithmetic and Symbolic Induction

Recent work has shown that large pretrained Language Models (LMs) can not only perform remarkably well on a range of Natural Language Processing (NLP) tasks but also start improving on reasoning tasks such as arithmetic induction, symbolic…

Computation and Language · Computer Science 2022-08-11 Jing Qian , Hong Wang , Zekun Li , Shiyang Li , Xifeng Yan

Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning

In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. However, existing literature has highlighted the…

Computation and Language · Computer Science 2024-02-14 Xinyi Wang , Wanrong Zhu , Michael Saxon , Mark Steyvers , William Yang Wang

Language Models Encode the Value of Numbers Linearly

Large language models (LLMs) have exhibited impressive competence in various tasks, but their internal mechanisms on mathematical problems are still under-explored. In this paper, we study a fundamental question: how language models encode…

Computation and Language · Computer Science 2024-11-15 Fangwei Zhu , Damai Dai , Zhifang Sui

Reversing Large Language Models for Efficient Training and Fine-Tuning

Large Language Models (LLMs) are known for their expensive and time-consuming training. Thus, oftentimes, LLMs are fine-tuned to address a specific task, given the pretrained weights of a pre-trained LLM considered a foundation model. In…

Computation and Language · Computer Science 2025-12-05 Eshed Gal , Moshe Eliasof , Javier Turek , Uri Ascher , Eran Treister , Eldad Haber

Functional Subspace, where language models can use vector algebra to solve problems

Large language models (LLMs) were invented for natural language tasks such as translation, but they have proved that they can perform highly complex functions across domains. Additionally, they have been thought to develop new skills…

Computation and Language · Computer Science 2026-05-12 Jung H. Lee , Sujith Vijayan

Less is More: Local Intrinsic Dimensions of Contextual Language Models

Understanding the internal mechanisms of large language models (LLMs) remains a challenging and complex endeavor. Even fundamental questions, such as how fine-tuning affects model behavior, often require extensive empirical evaluation. In…

Computation and Language · Computer Science 2025-10-28 Benjamin Matthias Ruppik , Julius von Rohrscheidt , Carel van Niekerk , Michael Heck , Renato Vukovic , Shutong Feng , Hsien-chin Lin , Nurul Lubis , Bastian Rieck , Marcus Zibrowius , Milica Gašić

The Fair Language Model Paradox

Large Language Models (LLMs) are widely deployed in real-world applications, yet little is known about their training dynamics at the token level. Evaluation typically relies on aggregated training loss, measured at the batch level, which…

Computation and Language · Computer Science 2024-10-17 Andrea Pinto , Tomer Galanti , Randall Balestriero

Augmented Language Models: a Survey

This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in…

Computation and Language · Computer Science 2023-02-16 Grégoire Mialon , Roberto Dessì , Maria Lomeli , Christoforos Nalmpantis , Ram Pasunuru , Roberta Raileanu , Baptiste Rozière , Timo Schick , Jane Dwivedi-Yu , Asli Celikyilmaz , Edouard Grave , Yann LeCun , Thomas Scialom

Language Models Struggle to Use Representations Learned In-Context

Though large language models (LLMs) have enabled great success across a wide variety of tasks, they still appear to fall short of one of the loftier goals of artificial intelligence research: creating an artificial system that can adapt its…

Computation and Language · Computer Science 2026-05-04 Michael A. Lepori , Tal Linzen , Ann Yuan , Katja Filippova

Large Language Models: A Mathematical Formulation

Large language models (LLMs) process and predict sequences containing text to answer questions, and address tasks including document summarization, providing recommendations, writing software and solving quantitative problems. We provide a…

Numerical Analysis · Mathematics 2026-02-02 Ricardo Baptista , Andrew Stuart , Son Tran

Teaching Smaller Language Models To Generalise To Unseen Compositional Questions (Full Thesis)

Pretrained large Language Models (LLMs) are able to answer questions that are unlikely to have been encountered during training. However a diversity of potential applications exist in the broad domain of reasoning systems and considerations…

Computation and Language · Computer Science 2024-11-27 Tim Hartill

Interpreting and Improving Large Language Models in Arithmetic Calculation

Large language models (LLMs) have demonstrated remarkable potential across numerous applications and have shown an emergent ability to tackle complex reasoning tasks, such as mathematical computations. However, even for the simplest…

Computation and Language · Computer Science 2024-09-04 Wei Zhang , Chaoqun Wan , Yonggang Zhang , Yiu-ming Cheung , Xinmei Tian , Xu Shen , Jieping Ye

End-to-End Speech Recognition Contextualization with Large Language Models

In recent years, Large Language Models (LLMs) have garnered significant attention from the research community due to their exceptional performance and generalization capabilities. In this paper, we introduce a novel method for…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-21 Egor Lakomkin , Chunyang Wu , Yassir Fathullah , Ozlem Kalinli , Michael L. Seltzer , Christian Fuegen

Compressing Large Language Models with Automated Sub-Network Search

Large Language Models (LLMs) demonstrate exceptional reasoning abilities, enabling strong generalization across diverse tasks such as commonsense reasoning and instruction following. However, as LLMs scale, inference costs become…

Computation and Language · Computer Science 2025-02-06 Rhea Sanjay Sukthanker , Benedikt Staffler , Frank Hutter , Aaron Klein

Language Models as a Knowledge Source for Cognitive Agents

Language models (LMs) are sentence-completion engines trained on massive corpora. LMs have emerged as a significant breakthrough in natural-language processing, providing capabilities that go far beyond sentence completion including…

Artificial Intelligence · Computer Science 2021-10-26 Robert E. Wray , III , James R. Kirk , John E. Laird

Methods for Estimating and Improving Robustness of Language Models

Despite their outstanding performance, large language models (LLMs) suffer notorious flaws related to their preference for simple, surface-level textual relations over full semantic complexity of the problem. This proposal investigates a…

Computation and Language · Computer Science 2022-06-20 Michal Štefánik

Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages

Low-resource languages (LRLs) face significant challenges in natural language processing (NLP) due to limited data. While current state-of-the-art large language models (LLMs) still struggle with LRLs, smaller multilingual models (mLMs)…

Computation and Language · Computer Science 2025-02-17 Daniil Gurgurov , Ivan Vykopal , Josef van Genabith , Simon Ostermann

Representation Of Lexical Stylistic Features In Language Models' Embedding Space

The representation space of pretrained Language Models (LMs) encodes rich information about words and their relationships (e.g., similarity, hypernymy, polysemy) as well as abstract semantic notions (e.g., intensity). In this paper, we…

Computation and Language · Computer Science 2023-06-02 Qing Lyu , Marianna Apidianaki , Chris Callison-Burch

Reliable, Adaptable, and Attributable Language Models with Retrieval

Parametric language models (LMs), which are trained on vast amounts of web data, exhibit remarkable flexibility and capability. However, they still face practical challenges such as hallucinations, difficulty in adapting to new data…

Computation and Language · Computer Science 2024-03-06 Akari Asai , Zexuan Zhong , Danqi Chen , Pang Wei Koh , Luke Zettlemoyer , Hannaneh Hajishirzi , Wen-tau Yih

Consistent Alignment of Word Embedding Models

Word embedding models offer continuous vector representations that can capture rich contextual semantics based on their word co-occurrence patterns. While these word vectors can provide very effective features used in many NLP tasks such as…

Computation and Language · Computer Science 2017-02-27 Cem Safak Sahin , Rajmonda S. Caceres , Brandon Oselio , William M. Campbell