Related papers: Learning Highly Recursive Input Grammars

Fast Deterministic Black-box Context-free Grammar Inference

Black-box context-free grammar inference is a hard problem as in many practical settings it only has access to a limited number of example programs. The state-of-the-art approach Arvada heuristically generalizes grammar rules starting from…

Software Engineering · Computer Science 2024-01-18 Mohammad Rifat Arefin , Suraj Shetiya , Zili Wang , Christoph Csallner

Incremental Context-free Grammar Inference in Black Box Settings

Black-box context-free grammar inference presents a significant challenge in many practical settings due to limited access to example programs. The state-of-the-art methods, Arvada and Treevada, employ heuristic approaches to generalize…

Programming Languages · Computer Science 2024-09-23 Feifei Li , Xiao Chen , Xi Xiao , Xiaoyu Sun , Chuan Chen , Shaohua Wang , Jitao Han

Context-Free Grammar Inference for Complex Programming Languages in Black Box Settings

Grammar inference for complex programming languages remains a significant challenge, as existing approaches fail to scale to real world datasets within practical time constraints. In our experiments, none of the state-of-the-art tools,…

Programming Languages · Computer Science 2026-01-21 Feifei Li , Xiao Chen , Xiaoyu Sun , Xi Xiao , Shaohua Wang , Yong Ding , Sheng Wen , Qing Li

Retrieval is Accurate Generation

Standard language models generate text by selecting tokens from a fixed, finite, and standalone vocabulary. We introduce a novel method that selects context-aware phrases from a collection of supporting documents. One of the most…

Computation and Language · Computer Science 2024-03-19 Bowen Cao , Deng Cai , Leyang Cui , Xuxin Cheng , Wei Bi , Yuexian Zou , Shuming Shi

Learning to Embed Sentences Using Attentive Recursive Trees

Sentence embedding is an effective feature representation for most deep learning-based NLP tasks. One prevailing line of methods is using recursive latent tree-structured networks to embed sentences with task-specific structures. However,…

Computation and Language · Computer Science 2018-11-16 Jiaxin Shi , Lei Hou , Juanzi Li , Zhiyuan Liu , Hanwang Zhang

Parsing Reflective Grammars

Existing technology can parse arbitrary context-free grammars, but only a single, static grammar per input. In order to support more powerful syntax-extension systems, we propose reflective grammars, which can modify their own syntax during…

Programming Languages · Computer Science 2011-02-14 Paul Stansifer , Mitchell Wand

Learning grammar with a divide-and-concur neural network

We implement a divide-and-concur iterative projection approach to context-free grammar inference. Unlike most state-of-the-art models of natural language processing, our method requires a relatively small number of discrete parameters,…

Computation and Language · Computer Science 2022-09-19 Sean Deyo , Veit Elser

Black-box Context-free Grammar Inference for Readable & Natural Grammars

Black-box context-free grammar inference is crucial for program analysis, reverse engineering, and security, yet existing tools such as Arvada, TreeVada, and Kedavra struggle with scalability, readability, and accuracy on large, complex…

Software Engineering · Computer Science 2025-11-10 Mohammad Rifat Arefin , Shanto Rahman , Christoph Csallner

Making Retrieval-Augmented Language Models Robust to Irrelevant Context

Retrieval-augmented language models (RALMs) hold promise to produce language understanding systems that are are factual, efficient, and up-to-date. An important desideratum of RALMs, is that retrieved information helps model performance…

Computation and Language · Computer Science 2024-05-07 Ori Yoran , Tomer Wolfson , Ori Ram , Jonathan Berant

The Neural State Pushdown Automata

In order to learn complex grammars, recurrent neural networks (RNNs) require sufficient computational resources to ensure correct grammar recognition. A widely-used approach to expand model capacity would be to couple an RNN to an external…

Neural and Evolutionary Computing · Computer Science 2019-09-23 Ankur Mali , Alexander Ororbia , C. Lee Giles

The Neural Network Pushdown Automaton: Model, Stack and Learning Simulations

In order for neural networks to learn complex languages or grammars, they must have sufficient computational power or resources to recognize or generate such languages. Though many approaches have been discussed, one ob- vious approach to…

Artificial Intelligence · Computer Science 2017-11-17 G. Z. Sun , C. L. Giles , H. H. Chen , Y. C. Lee

Learning to Compose and Reason with Language Tree Structures for Visual Grounding

Grounding natural language in images, such as localizing "the black dog on the left of the tree", is one of the core problems in artificial intelligence, as it needs to comprehend the fine-grained and compositional language space. However,…

Computer Vision and Pattern Recognition · Computer Science 2019-06-06 Richang Hong , Daqing Liu , Xiaoyu Mo , Xiangnan He , Hanwang Zhang

Compositional Instruction Following with Language Models and Reinforcement Learning

Combining reinforcement learning with language grounding is challenging as the agent needs to explore the environment while simultaneously learning multiple language-conditioned tasks. To address this, we introduce a novel method: the…

Machine Learning · Computer Science 2025-01-23 Vanya Cohen , Geraud Nangue Tasse , Nakul Gopalan , Steven James , Matthew Gombolay , Ray Mooney , Benjamin Rosman

Recursive Tree Grammar Autoencoders

Machine learning on trees has been mostly focused on trees as input to algorithms. Much less research has investigated trees as output, which has many applications, such as molecule optimization for drug discovery, or hint generation for…

Machine Learning · Computer Science 2022-02-11 Benjamin Paassen , Irena Koprinska , Kalina Yacef

Active Example Selection for In-Context Learning

With a handful of demonstration examples, large-scale language models show strong capability to perform various tasks by in-context learning from these examples, without any fine-tuning. We demonstrate that in-context learning performance…

Computation and Language · Computer Science 2022-11-10 Yiming Zhang , Shi Feng , Chenhao Tan

Grammar Variational Autoencoder

Deep generative models have been wildly successful at learning coherent latent representations for continuous data such as video and audio. However, generative modeling of discrete data such as arithmetic expressions and molecular…

Machine Learning · Statistics 2017-03-07 Matt J. Kusner , Brooks Paige , José Miguel Hernández-Lobato

Autolearn: Learn by Surprise, Commit by Proof

We propose Autolearn, a framework that enables language models to learn from documents they read, with no external supervision. Passages that produce anomalously high per-token loss are flagged, verified through a self-generated Q&A chain,…

Machine Learning · Computer Science 2026-05-08 Kang-Sin Choi

Recursive Top-Down Production for Sentence Generation with Latent Trees

We model the recursive production property of context-free grammars for natural and synthetic languages. To this end, we present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves, allowing…

Computation and Language · Computer Science 2020-10-12 Shawn Tan , Yikang Shen , Timothy J. O'Donnell , Alessandro Sordoni , Aaron Courville

Neural Language Modeling by Jointly Learning Syntax and Lexicon

We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks…

Computation and Language · Computer Science 2018-02-20 Yikang Shen , Zhouhan Lin , Chin-Wei Huang , Aaron Courville

Rule Augmented Unsupervised Constituency Parsing

Recently, unsupervised parsing of syntactic trees has gained considerable attention. A prototypical approach to such unsupervised parsing employs reinforcement learning and auto-encoders. However, no mechanism ensures that the learnt model…

Computation and Language · Computer Science 2021-05-24 Atul Sahay , Anshul Nasery , Ayush Maheshwari , Ganesh Ramakrishnan , Rishabh Iyer