Related papers: Learning Highly Recursive Input Grammars
Black-box context-free grammar inference is a hard problem as in many practical settings it only has access to a limited number of example programs. The state-of-the-art approach Arvada heuristically generalizes grammar rules starting from…
Black-box context-free grammar inference presents a significant challenge in many practical settings due to limited access to example programs. The state-of-the-art methods, Arvada and Treevada, employ heuristic approaches to generalize…
Grammar inference for complex programming languages remains a significant challenge, as existing approaches fail to scale to real world datasets within practical time constraints. In our experiments, none of the state-of-the-art tools,…
Standard language models generate text by selecting tokens from a fixed, finite, and standalone vocabulary. We introduce a novel method that selects context-aware phrases from a collection of supporting documents. One of the most…
Sentence embedding is an effective feature representation for most deep learning-based NLP tasks. One prevailing line of methods is using recursive latent tree-structured networks to embed sentences with task-specific structures. However,…
Existing technology can parse arbitrary context-free grammars, but only a single, static grammar per input. In order to support more powerful syntax-extension systems, we propose reflective grammars, which can modify their own syntax during…
We implement a divide-and-concur iterative projection approach to context-free grammar inference. Unlike most state-of-the-art models of natural language processing, our method requires a relatively small number of discrete parameters,…
Black-box context-free grammar inference is crucial for program analysis, reverse engineering, and security, yet existing tools such as Arvada, TreeVada, and Kedavra struggle with scalability, readability, and accuracy on large, complex…
Retrieval-augmented language models (RALMs) hold promise to produce language understanding systems that are are factual, efficient, and up-to-date. An important desideratum of RALMs, is that retrieved information helps model performance…
In order to learn complex grammars, recurrent neural networks (RNNs) require sufficient computational resources to ensure correct grammar recognition. A widely-used approach to expand model capacity would be to couple an RNN to an external…
In order for neural networks to learn complex languages or grammars, they must have sufficient computational power or resources to recognize or generate such languages. Though many approaches have been discussed, one ob- vious approach to…
Grounding natural language in images, such as localizing "the black dog on the left of the tree", is one of the core problems in artificial intelligence, as it needs to comprehend the fine-grained and compositional language space. However,…
Combining reinforcement learning with language grounding is challenging as the agent needs to explore the environment while simultaneously learning multiple language-conditioned tasks. To address this, we introduce a novel method: the…
Machine learning on trees has been mostly focused on trees as input to algorithms. Much less research has investigated trees as output, which has many applications, such as molecule optimization for drug discovery, or hint generation for…
With a handful of demonstration examples, large-scale language models show strong capability to perform various tasks by in-context learning from these examples, without any fine-tuning. We demonstrate that in-context learning performance…
Deep generative models have been wildly successful at learning coherent latent representations for continuous data such as video and audio. However, generative modeling of discrete data such as arithmetic expressions and molecular…
We propose Autolearn, a framework that enables language models to learn from documents they read, with no external supervision. Passages that produce anomalously high per-token loss are flagged, verified through a self-generated Q&A chain,…
We model the recursive production property of context-free grammars for natural and synthetic languages. To this end, we present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves, allowing…
We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks…
Recently, unsupervised parsing of syntactic trees has gained considerable attention. A prototypical approach to such unsupervised parsing employs reinforcement learning and auto-encoders. However, no mechanism ensures that the learnt model…