Related papers: Synthesizing Program Input Grammars

Sample-Free Learning of Input Grammars for Comprehensive Software Fuzzing

Generating valid test inputs for a program is much easier if one knows the input language. We present first successes for a technique that, given a program P without any input samples or models, learns an input grammar that represents the…

Software Engineering · Computer Science 2018-10-22 Rahul Gopinath , Björn Mathis , Mathias Höschele , Alexander Kampmann , Andreas Zeller

Inferring Input Grammars from Dynamic Control Flow

A program is characterized by its input model, and a formal input model can be of use in diverse areas including vulnerability analysis, reverse engineering, fuzzing and software testing, clone detection and refactoring. Unfortunately,…

Software Engineering · Computer Science 2019-12-13 Rahul Gopinath , Björn Mathis , Andreas Zeller

Low-Cost and Comprehensive Non-textual Input Fuzzing with LLM-Synthesized Input Generators

Modern software often accepts inputs with highly complex grammars. Recent advances in large language models (LLMs) have shown that they can be used to synthesize high-quality natural language text and code that conforms to the grammar of a…

Software Engineering · Computer Science 2025-02-03 Kunpeng Zhang , Zongjie Li , Daoyuan Wu , Shuai Wang , Xin Xia

Generating Inputs for Grammar Mining using Dynamic Symbolic Execution

A vast number of software systems include components that parse and process structured input. In addition to programming languages, which are analyzed by compilers or interpreters, there are numerous components that process standardized or…

Programming Languages · Computer Science 2025-08-07 Andreas Pointner , Josef Pichler , Herbert Prähofer

Program Synthesis via Test-Time Transduction

We introduce transductive program synthesis, a new formulation of the program synthesis task that explicitly leverages test inputs during synthesis. While prior approaches to program synthesis--whether based on natural language descriptions…

Artificial Intelligence · Computer Science 2025-10-22 Kang-il Lee , Jahyun Koo , Seunghyun Yoon , Minbeom Kim , Hyukhun Koh , Dongryeol Lee , Kyomin Jung

Building Fast Fuzzers

Fuzzing is one of the key techniques for evaluating the robustness of programs against attacks. Fuzzing has to be effective in producing inputs that cover functionality and find vulnerabilities. But it also has to be efficient in producing…

Software Engineering · Computer Science 2019-11-19 Rahul Gopinath , Andreas Zeller

Learn&Fuzz: Machine Learning for Input Fuzzing

Fuzzing consists of repeatedly testing an application with modified, or fuzzed, inputs with the goal of finding security vulnerabilities in input-parsing code. In this paper, we show how to automate the generation of an input grammar…

Artificial Intelligence · Computer Science 2017-01-26 Patrice Godefroid , Hila Peleg , Rishabh Singh

Generative Explanations for Program Synthesizers

Despite great advances in program synthesis techniques, they remain algorithmic black boxes. Although they guarantee that when synthesis is successful, the implementation satisfies the specification, they provide no additional information…

Programming Languages · Computer Science 2024-03-07 Amirmohammad Nazari , Souti Chattopadhyay , Swabha Swayamdipta , Mukund Raghothaman

An Automated Testing and Debugging Toolkit for Gate-Level Logic Synthesis Applications

Correctness and robustness are essential for logic synthesis applications, but they are often only tested with a limited set of benchmarks. Moreover, when the application fails on a large benchmark, the debugging process may be tedious and…

Software Engineering · Computer Science 2022-07-28 Siang-Yun Lee , Heinz Riener , Giovanni De Micheli

Evolutionary Grammar-Based Fuzzing

A fuzzer provides randomly generated inputs to a targeted software to expose erroneous behavior. To efficiently detect defects, generated inputs should conform to the structure of the input format and thus, grammars can be used to generate…

Software Engineering · Computer Science 2020-08-05 Martin Eberlein , Yannic Noller , Thomas Vogel , Lars Grunske

FuzzerGym: A Competitive Framework for Fuzzing and Learning

Fuzzing is a commonly used technique designed to test software by automatically crafting program inputs. Currently, the most successful fuzzing algorithms emphasize simple, low-overhead strategies with the ability to efficiently monitor…

Software Engineering · Computer Science 2018-07-20 William Drozd , Michael D. Wagner

Detecting and Explaining (In-)equivalence of Context-Free Grammars

We propose a scalable framework for deciding, proving, and explaining (in-)equivalence of context-free grammars. We present an implementation of the framework and evaluate it on large data sets collected within educational support systems.…

Formal Languages and Automata Theory · Computer Science 2026-04-09 Marko Schmellenkamp , Thomas Zeume , Sven Argo , Sandra Kiefer , Cedric Siems , Fynn Stebel

Glass-Box Program Synthesis: A Machine Learning Approach

Recently proposed models which learn to write computer programs from data use either input/output examples or rich execution traces. Instead, we argue that a novel alternative is to use a glass-box loss function, given as a program itself…

Machine Learning · Computer Science 2017-09-27 Konstantina Christakopoulou , Adam Tauman Kalai

SynCode: LLM Generation with Grammar Augmentation

LLMs are widely used in complex AI applications. These applications underscore the need for LLM outputs to adhere to a specific format, for their integration with other components in the systems. Typically the format rules e.g., for data…

Machine Learning · Computer Science 2024-11-07 Shubham Ugare , Tarun Suresh , Hangoo Kang , Sasa Misailovic , Gagandeep Singh

Active Learning of Input Grammars

Knowing the precise format of a program's input is a necessary prerequisite for systematic testing. Given a program and a small set of sample inputs, we (1) track the data flow of inputs to aggregate input fragments that share the same data…

Programming Languages · Computer Science 2017-08-30 Matthias Höschele , Alexander Kampmann , Andreas Zeller

SAND: Decoupling Sanitization from Fuzzing for Low Overhead

Sanitizers provide robust test oracles for various software vulnerabilities. Fuzzing on sanitizer-enabled programs has been the best practice to find software bugs. Since sanitizers need to heavily instrument a target program to insert…

Cryptography and Security · Computer Science 2025-02-13 Ziqiao Kong , Shaohua Li , Heqing Huang , Zhendong Su

Understanding Programs by Exploiting (Fuzzing) Test Cases

Semantic understanding of programs has attracted great attention in the community. Inspired by recent successes of large language models (LLMs) in natural language understanding, tremendous progress has been made by treating programming…

Machine Learning · Computer Science 2023-06-13 Jianyu Zhao , Yuyang Rong , Yiwen Guo , Yifeng He , Hao Chen

GASE: Generatively Augmented Sentence Encoding

We propose a training-free approach to improve sentence embeddings leveraging test-time compute by applying generative text models for data augmentation at inference time. Unlike conventional data augmentation that utilises synthetic…

Computation and Language · Computer Science 2025-09-09 Manuel Frank , Haithem Afli

Embedding Grammars

Classic grammars and regular expressions can be used for a variety of purposes, including parsing, intent detection, and matching. However, the comparisons are performed at a structural level, with constituent elements (words or characters)…

Computation and Language · Computer Science 2018-08-16 David Wingate , William Myers , Nancy Fulda , Tyler Etchart

Iterative method of generating artificial context-free grammars

Grammatical inference is a machine learning area, whose fundamentals are built around learning sets. At present, real-life data and examples from manually crafted grammars are used to test their learning performance. This paper aims to…

Formal Languages and Automata Theory · Computer Science 2019-11-15 Olgierd Unold , Agnieszka Kaczmarek , Łukasz Culer