Related papers: Language embeddings that preserve staging and safe…

Language Models are Universal Embedders

In the large language model (LLM) revolution, embedding is a key component of various systems, such as retrieving knowledge or memories for LLMs or building content moderation filters. As such cases span from English to other natural or…

Computation and Language · Computer Science 2025-05-23 Xin Zhang , Zehan Li , Yanzhao Zhang , Dingkun Long , Pengjun Xie , Meishan Zhang , Min Zhang

Building Code with Dynamic Staging

When creating a new domain-specific language (DSL) it is common to embed it as a part of a flexible host language, rather than creating it entirely from scratch. The semantics of an embedded DSL (EDSL) is either given directly as a set of…

Programming Languages · Computer Science 2016-12-06 Piotr Danilewski , Philipp Slusallek

Analyzing the Surprising Variability in Word Embedding Stability Across Languages

Word embeddings are powerful representations that form the foundation of many natural language processing architectures, both in English and in other languages. To gain further insight into word embeddings, we explore their stability (e.g.,…

Computation and Language · Computer Science 2021-09-13 Laura Burdick , Jonathan K. Kummerfeld , Rada Mihalcea

Universal computation is intrinsic to language model decoding

Language models now provide an interface to express and often solve general problems in natural language, yet their ultimate computational capabilities remain a major topic of scientific debate. Unlike a formal computer, a language model is…

Computation and Language · Computer Science 2026-02-11 Alex Lewandowski , Marlos C. Machado , Dale Schuurmans

SCELMo: Source Code Embeddings from Language Models

Continuous embeddings of tokens in computer programs have been used to support a variety of software development tools, including readability, code search, and program repair. Contextual embeddings are common in natural language processing…

Software Engineering · Computer Science 2020-04-29 Rafael - Michael Karampatsis , Charles Sutton

Lexical Manifold Reconfiguration in Large Language Models: A Novel Architectural Approach for Contextual Modulation

Contextual adaptation in token embeddings plays a central role in determining how well language models maintain coherence and retain semantic relationships over extended text sequences. Static embeddings often impose constraints on lexical…

Computation and Language · Computer Science 2025-03-27 Koinis Vassilis , Godfrey Milbourne , Harriet Featherstone , Xanthe Peverell , Yorick Bletchley , Zachary Montford

A Universal Kernel for Learning Regular Languages

We give a universal kernel that renders all the regular languages linearly separable. We are not able to compute this kernel efficiently and conjecture that it is intractable, but we do have an efficient $\eps$-approximation.

Machine Learning · Computer Science 2007-12-07 Leonid , Kontorovich

Emulation-Completeness of Programming Languages

We study when a programming language can emulate programs written in that same language without delegating the guest program back to the host evaluator or compiler. We call this property emulation-completeness. The central observation is…

Programming Languages · Computer Science 2026-04-20 Gregory Morse , Tamás Kozsik

A Type and Scope Safe Universe of Syntaxes with Binding: Their Semantics and Proofs

Almost every programming language's syntax includes a notion of binder and corresponding bound occurrences, along with the accompanying notions of $\alpha$-equivalence, capture-avoiding substitution, typing contexts, runtime environments,…

Programming Languages · Computer Science 2021-10-13 Guillaume Allais , Robert Atkey , James Chapman , Conor McBride , James McKinna

Total Recall, Language Processing, and Software Engineering

A broad class of software engineering problems can be generalized as the "total recall problem". This short paper claims that identifying and exploring total recall language processing problems in software engineering is an important task…

Software Engineering · Computer Science 2018-11-13 Zhe Yu , Tim Menzies

Test-Time Safety Alignment

Recent work has shown that a model's input word embeddings can serve as effective control variables for steering its behavior toward outputs that satisfy desired properties. However, this has only been demonstrated for pretrained…

Computation and Language · Computer Science 2026-04-30 Baturay Saglam , Dionysis Kalogerias

Language Model Memory and Memory Models for Language

The ability of machine learning models to store input information in hidden layer vector embeddings, analogous to the concept of `memory', is widely employed but not well characterized. We find that language model embeddings typically…

Computation and Language · Computer Science 2026-05-20 Benjamin L. Badger

Learning Meta-Embeddings by Using Ensembles of Embedding Sets

Word embeddings -- distributed representations of words -- in deep learning are beneficial for many tasks in natural language processing (NLP). However, different embedding sets vary greatly in quality and characteristics of the captured…

Computation and Language · Computer Science 2015-12-31 Wenpeng Yin , Hinrich Schütze

Embedding Grammars

Classic grammars and regular expressions can be used for a variety of purposes, including parsing, intent detection, and matching. However, the comparisons are performed at a structural level, with constituent elements (words or characters)…

Computation and Language · Computer Science 2018-08-16 David Wingate , William Myers , Nancy Fulda , Tyler Etchart

Text-to-Code Generation with Modality-relative Pre-training

Large pre-trained language models have recently been expanded and applied to programming language tasks with great success, often through further pre-training of a strictly-natural language model--where training sequences typically contain…

Computation and Language · Computer Science 2024-02-13 Fenia Christopoulou , Guchun Zhang , Gerasimos Lampouras

Shared Global and Local Geometry of Language Model Embeddings

Researchers have recently suggested that models share common representations. In our work, we find numerous geometric similarities across the token embeddings of large language models. First, we find ``global'' similarities: token…

Computation and Language · Computer Science 2025-07-16 Andrew Lee , Melanie Weber , Fernanda Viégas , Martin Wattenberg

Linearly Controlled Language Generation with Performative Guarantees

The increasing prevalence of Large Language Models (LMs) in critical applications highlights the need for controlled language generation strategies that are not only computationally efficient but that also enjoy performance guarantees. To…

Computation and Language · Computer Science 2026-03-16 Emily Cheng , Carmen Amo Alonso

A Programming Language for Feasible Solutions

Runtime efficiency and termination are crucial properties in the studies of program verification. Instead of dealing with these issues in an ad hoc manner, it would be useful to develop a robust framework in which such properties are…

Programming Languages · Computer Science 2026-04-06 Weijun Chen , Yuxi Fu , Huan Long

Embedded Pattern Matching

Haskell is a popular choice for hosting deeply embedded languages. A recurring challenge for these embeddings is how to seamlessly integrate user defined algebraic data types. In particular, one important, convenient, and expressive feature…

Programming Languages · Computer Science 2022-08-01 Trevor L. McDonell , Joshua D. Meredith , Gabriele Keller

Structural Embedding Projection for Contextual Large Language Model Inference

Structured embedding transformations offer a promising approach for enhancing the efficiency and coherence of language model inference. The introduction of Structural Embedding Projection (SEP) provides a mechanism for refining token…

Computation and Language · Computer Science 2025-08-11 Vincent Enoasmo , Cedric Featherstonehaugh , Xavier Konstantinopoulos , Zacharias Huntington