Related papers: Neural Composition: Learning to Generate from Mult…

Exploiting compositionality to explore a large space of model structures

The recent proliferation of richly structured probabilistic models raises the question of how to automatically determine an appropriate model for a dataset. We investigate this question for a space of matrix decomposition models which can…

Machine Learning · Computer Science 2012-10-19 Roger Grosse , Ruslan R Salakhutdinov , William T. Freeman , Joshua B. Tenenbaum

A Generative Model of Words and Relationships from Multiple Sources

Neural language models are a powerful tool to embed words into semantic vector spaces. However, learning such models generally relies on the availability of abundant and diverse training examples. In highly specialised domains this…

Computation and Language · Computer Science 2015-12-04 Stephanie L. Hyland , Theofanis Karaletsos , Gunnar Rätsch

Learning to Discover, Ground and Use Words with Segmental Neural Language Models

We propose a segmental neural language model that combines the generalization power of neural networks with the ability to discover word-like units that are latent in unsegmented character sequences. In contrast to previous segmentation…

Computation and Language · Computer Science 2019-06-19 Kazuya Kawakami , Chris Dyer , Phil Blunsom

Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective

A common practice in large language model (LLM) usage for complex analytical tasks such as code generation, is to sample a solution for the entire task within the model's context window. Previous works have shown that subtask decomposition…

Artificial Intelligence · Computer Science 2025-02-03 Yotam Wolf , Binyamin Rothberg , Dorin Shteyman , Amnon Shashua

Modular Networks: Learning to Decompose Neural Computation

Scaling model capacity has been vital in the success of deep learning. For a typical network, necessary compute resources and training time grow dramatically with model size. Conditional computation is a promising way to increase the number…

Machine Learning · Computer Science 2018-11-14 Louis Kirsch , Julius Kunze , David Barber

Learning to Decode Collaboratively with Multiple Language Models

We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level. We model the decision of which LLM generates the next token as a latent variable. By optimizing the…

Computation and Language · Computer Science 2024-08-28 Shannon Zejiang Shen , Hunter Lang , Bailin Wang , Yoon Kim , David Sontag

Investigating the Working of Text Classifiers

Text classification is one of the most widely studied tasks in natural language processing. Motivated by the principle of compositionality, large multilayer neural network models have been employed for this task in an attempt to effectively…

Computation and Language · Computer Science 2018-08-07 Devendra Singh Sachan , Manzil Zaheer , Ruslan Salakhutdinov

Explaining How Transformers Use Context to Build Predictions

Language Generation Models produce words based on the previous context. Although existing methods offer input attributions as explanations for a model's prediction, it is still unclear how prior words affect the model's decision throughout…

Computation and Language · Computer Science 2023-05-23 Javier Ferrando , Gerard I. Gállego , Ioannis Tsiamas , Marta R. Costa-jussà

Compiling Language Models from a Linguistically Motivated Unification Grammar

Systems now exist which are able to compile unification grammars into language models that can be included in a speech recognizer, but it is so far unclear whether non-trivial linguistically principled grammars can be used for this purpose.…

Computation and Language · Computer Science 2007-05-23 Manny Rayner , Beth Ann Hockey , Frankie James , Elizabeth O. Bratt , Sharon Goldwater , Mark Gawron

Unsupervised Word Segmentation with Bi-directional Neural Language Model

We present an unsupervised word segmentation model, in which the learning objective is to maximize the generation probability of a sentence given its all possible segmentation. Such generation probability can be factorized into the…

Computation and Language · Computer Science 2021-03-03 Lihao Wang , Zongyi Li , Xiaoqing Zheng

CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models

While many languages possess processes of joining two or more words to create compound words, previous studies have been typically limited only to languages with excessively productive compound formation (e.g., German, Dutch) and there is…

Computation and Language · Computer Science 2023-10-24 Benjamin Minixhofer , Jonas Pfeiffer , Ivan Vulić

A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation

We propose Composition Sampling, a simple but effective method to generate diverse outputs for conditional generation of higher quality compared to previous stochastic decoding strategies. It builds on recently proposed plan-based neural…

Computation and Language · Computer Science 2022-03-30 Shashi Narayan , Gonçalo Simões , Yao Zhao , Joshua Maynez , Dipanjan Das , Michael Collins , Mirella Lapata

Development of Compositionality and Generalization through Interactive Learning of Language and Action of Robots

Humans excel at applying learned behavior to unlearned situations. A crucial component of this generalization behavior is our ability to compose/decompose a whole into reusable parts, an attribute known as compositionality. One of the…

Artificial Intelligence · Computer Science 2024-07-24 Prasanna Vijayaraghavan , Jeffrey Frederic Queisser , Sergio Verduzco Flores , Jun Tani

A Joint Model for Word Embedding and Word Morphology

This paper presents a joint model for performing unsupervised morphological analysis on words, and learning a character-level composition function from morphemes to word embeddings. Our model splits individual words into segments, and…

Computation and Language · Computer Science 2016-06-09 Kris Cao , Marek Rei

Understanding Subword Compositionality of Large Language Models

Large language models (LLMs) take sequences of subwords as input, requiring them to effective compose subword representations into meaningful word-level representations. In this paper, we present a comprehensive set of experiments to probe…

Computation and Language · Computer Science 2025-08-26 Qiwei Peng , Yekun Chai , Anders Søgaard

Token-level Ensembling of Models with Different Vocabularies

Model ensembling is a technique to combine the predicted distributions of two or more models, often leading to improved robustness and performance. For ensembling in text generation, the next token's probability distribution is derived from…

Computation and Language · Computer Science 2025-03-03 Rachel Wicks , Kartik Ravisankar , Xinchen Yang , Philipp Koehn , Matt Post

Composing and Embedding the Words-as-Classifiers Model of Grounded Semantics

The words-as-classifiers model of grounded lexical semantics learns a semantic fitness score between physical entities and the words that are used to denote those entities. In this paper, we explore how such a model can incrementally…

Computation and Language · Computer Science 2019-11-11 Daniele Moro , Stacy Black , Casey Kennington

Lossless Vocabulary Reduction for Auto-Regressive Language Models

Tokenization -- the process of decomposing a given text into a sequence of subwords called tokens -- is one of the key components in the development of language models. Particularly, auto-regressive language models generate texts token by…

Computation and Language · Computer Science 2026-02-19 Daiki Chijiwa , Taku Hasegawa , Kyosuke Nishida , Shin'ya Yamaguchi , Tomoya Ohba , Tamao Sakao , Susumu Takeuchi

Distributed Representations for Compositional Semantics

The mathematical representation of semantics is a key issue for Natural Language Processing (NLP). A lot of research has been devoted to finding ways of representing the semantics of individual words in vector spaces. Distributional…

Computation and Language · Computer Science 2014-11-13 Karl Moritz Hermann

A Model for Combinatorial Dictionary Learning and Inference

We are often interested in decomposing complex, structured data into simple components that explain the data. The linear version of this problem is well-studied as dictionary learning and factor analysis. In this work, we propose a…

Machine Learning · Computer Science 2024-07-29 Avrim Blum , Kavya Ravichandran