Related papers: Inputs from Hell: Generating Uncommon Inputs from …

Active Learning of Input Grammars

Knowing the precise format of a program's input is a necessary prerequisite for systematic testing. Given a program and a small set of sample inputs, we (1) track the data flow of inputs to aggregate input fragments that share the same data…

Programming Languages · Computer Science 2017-08-30 Matthias Höschele , Alexander Kampmann , Andreas Zeller

Sample-Free Learning of Input Grammars for Comprehensive Software Fuzzing

Generating valid test inputs for a program is much easier if one knows the input language. We present first successes for a technique that, given a program P without any input samples or models, learns an input grammar that represents the…

Software Engineering · Computer Science 2018-10-22 Rahul Gopinath , Björn Mathis , Mathias Höschele , Alexander Kampmann , Andreas Zeller

Inferring Input Grammars from Dynamic Control Flow

A program is characterized by its input model, and a formal input model can be of use in diverse areas including vulnerability analysis, reverse engineering, fuzzing and software testing, clone detection and refactoring. Unfortunately,…

Software Engineering · Computer Science 2019-12-13 Rahul Gopinath , Björn Mathis , Andreas Zeller

Inferring Input Grammars from Code with Symbolic Parsing

Generating effective test inputs for a software system requires that these inputs be valid, as they will otherwise be rejected without reaching actual functionality. In the absence of a specification for the input language, common test…

Software Engineering · Computer Science 2025-03-12 Leon Bettscheider , Andreas Zeller

Bayesian Inference of Regular Expressions from Human-Generated Example Strings

In programming by example, users "write" programs by generating a small number of input-output examples and asking the computer to synthesize consistent programs. We consider a challenging problem in this domain: learning regular…

Artificial Intelligence · Computer Science 2018-09-28 Long Ouyang

Random Sentences from a Generalized Phrase-Structure Grammar Interpreter

In numerous domains in cognitive science it is often useful to have a source for randomly generated corpora. These corpora may serve as a foundation for artificial stimuli in a learning experiment (e.g., Ellefson & Christiansen, 2000), or…

Computation and Language · Computer Science 2007-05-23 Rick Dale

Random Grammar-based Testing for Covering All Non-Terminals

In the context of software testing, generating complex data inputs is frequently performed using a grammar-based specification. For combinatorial reasons, an exhaustive generation of the data -- of a given size -- is practically impossible,…

Software Engineering · Computer Science 2013-11-27 Alois Dreyfus , Pierre-Cyrille Heam , Olga Kouchnarenko

Textual Features for Programming by Example

In Programming by Example, a system attempts to infer a program from input and output examples, generally by searching for a composition of certain base functions. Performing a naive brute force search is infeasible for even mildly involved…

Artificial Intelligence · Computer Science 2012-09-19 Aditya Krishna Menon , Omer Tamuz , Sumit Gulwani , Butler Lampson , Adam Tauman Kalai

Input-Gen: Guided Generation of Stateful Inputs for Testing, Tuning, and Training

The size and complexity of software applications is increasing at an accelerating pace. Source code repositories (along with their dependencies) require vast amounts of labor to keep them tested, maintained, and up to date. As the…

Software Engineering · Computer Science 2024-06-14 Ivan R. Ivanov , Joachim Meyer , Aiden Grossman , William S. Moses , Johannes Doerfert

Random generation of closed simply-typed $\lambda$-terms: a synergy between logic programming and Boltzmann samplers

A natural approach to software quality assurance consists in writing unit tests securing programmer-declared code invariants. Throughout the literature a great body of work has been devoted to tools and techniques automating this…

Logic in Computer Science · Computer Science 2017-09-14 Maciej Bendkowski , Katarzyna Grygiel , Paul Tarau

Generating Inputs for Grammar Mining using Dynamic Symbolic Execution

A vast number of software systems include components that parse and process structured input. In addition to programming languages, which are analyzed by compilers or interpreters, there are numerous components that process standardized or…

Programming Languages · Computer Science 2025-08-07 Andreas Pointner , Josef Pichler , Herbert Prähofer

Using Large Language Models for Student-Code Guided Test Case Generation in Computer Science Education

In computer science education, test cases are an integral part of programming assignments since they can be used as assessment items to test students' programming knowledge and provide personalized feedback on student-written code. The goal…

Computation and Language · Computer Science 2024-02-13 Nischal Ashok Kumar , Andrew Lan

From Words to Code: Harnessing Data for Program Synthesis from Natural Language

Creating programs to correctly manipulate data is a difficult task, as the underlying programming languages and APIs can be challenging to learn for many users who are not skilled programmers. Large language models (LLMs) demonstrate…

Databases · Computer Science 2023-05-04 Anirudh Khatry , Joyce Cahoon , Jordan Henkel , Shaleen Deep , Venkatesh Emani , Avrilia Floratou , Sumit Gulwani , Vu Le , Mohammad Raza , Sherry Shi , Mukul Singh , Ashish Tiwari

Inferring Attributed Grammars from Parser Implementations

Software systems that process structured inputs often lack complete and up-to-date specifications, which specify the input syntax and the semantics of input processing. While grammar mining techniques have focused on recovering syntactic…

Software Engineering · Computer Science 2025-07-18 Andreas Pointner , Josef Pichler , Herbert Prähofer

Neural Sketch Learning for Conditional Program Generation

We study the problem of generating source code in a strongly typed, Java-like programming language, given a label (for example a set of API calls or types) carrying a small amount of information about the code that is desired. The generated…

Programming Languages · Computer Science 2018-04-16 Vijayaraghavan Murali , Letao Qi , Swarat Chaudhuri , Chris Jermaine

Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection

Grammatical error correction, like other machine learning tasks, greatly benefits from large quantities of high quality training data, which is typically expensive to produce. While writing a program to automatically generate realistic…

Computation and Language · Computer Science 2018-10-02 Sudhanshu Kasewa , Pontus Stenetorp , Sebastian Riedel

Locally Typical Sampling

Today's probabilistic language generators fall short when it comes to producing coherent and fluent text despite the fact that the underlying models perform well under standard metrics, e.g., perplexity. This discrepancy has puzzled the…

Computation and Language · Computer Science 2025-06-06 Clara Meister , Tiago Pimentel , Gian Wiher , Ryan Cotterell

Model-based generation of natural language specifications

Application of formal models provides many benefits for the software and system development, however, the learning curve of formal languages could be a critical factor for an industrial project. Thus, a natural language specification that…

Software Engineering · Computer Science 2016-12-07 Phan Vo Thu Nhat , Maria Spichkova

Toward Trustworthy Neural Program Synthesis

We develop an approach to estimate the probability that a program sampled from a large language model is correct. Given a natural language description of a programming problem, our method samples both candidate programs as well as candidate…

Software Engineering · Computer Science 2023-10-11 Darren Key , Wen-Ding Li , Kevin Ellis

DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation

Despite their growing capabilities, language models still frequently reproduce content from their training data, generate repetitive text, and favor common grammatical patterns and vocabulary. A possible cause is the decoding strategy: the…

Computation and Language · Computer Science 2026-01-15 Giorgio Franceschelli , Mirco Musolesi