Related papers: Inputs from Hell: Generating Uncommon Inputs from …
Knowing the precise format of a program's input is a necessary prerequisite for systematic testing. Given a program and a small set of sample inputs, we (1) track the data flow of inputs to aggregate input fragments that share the same data…
Generating valid test inputs for a program is much easier if one knows the input language. We present first successes for a technique that, given a program P without any input samples or models, learns an input grammar that represents the…
A program is characterized by its input model, and a formal input model can be of use in diverse areas including vulnerability analysis, reverse engineering, fuzzing and software testing, clone detection and refactoring. Unfortunately,…
Generating effective test inputs for a software system requires that these inputs be valid, as they will otherwise be rejected without reaching actual functionality. In the absence of a specification for the input language, common test…
In programming by example, users "write" programs by generating a small number of input-output examples and asking the computer to synthesize consistent programs. We consider a challenging problem in this domain: learning regular…
In numerous domains in cognitive science it is often useful to have a source for randomly generated corpora. These corpora may serve as a foundation for artificial stimuli in a learning experiment (e.g., Ellefson & Christiansen, 2000), or…
In the context of software testing, generating complex data inputs is frequently performed using a grammar-based specification. For combinatorial reasons, an exhaustive generation of the data -- of a given size -- is practically impossible,…
In Programming by Example, a system attempts to infer a program from input and output examples, generally by searching for a composition of certain base functions. Performing a naive brute force search is infeasible for even mildly involved…
The size and complexity of software applications is increasing at an accelerating pace. Source code repositories (along with their dependencies) require vast amounts of labor to keep them tested, maintained, and up to date. As the…
A natural approach to software quality assurance consists in writing unit tests securing programmer-declared code invariants. Throughout the literature a great body of work has been devoted to tools and techniques automating this…
A vast number of software systems include components that parse and process structured input. In addition to programming languages, which are analyzed by compilers or interpreters, there are numerous components that process standardized or…
In computer science education, test cases are an integral part of programming assignments since they can be used as assessment items to test students' programming knowledge and provide personalized feedback on student-written code. The goal…
Creating programs to correctly manipulate data is a difficult task, as the underlying programming languages and APIs can be challenging to learn for many users who are not skilled programmers. Large language models (LLMs) demonstrate…
Software systems that process structured inputs often lack complete and up-to-date specifications, which specify the input syntax and the semantics of input processing. While grammar mining techniques have focused on recovering syntactic…
We study the problem of generating source code in a strongly typed, Java-like programming language, given a label (for example a set of API calls or types) carrying a small amount of information about the code that is desired. The generated…
Grammatical error correction, like other machine learning tasks, greatly benefits from large quantities of high quality training data, which is typically expensive to produce. While writing a program to automatically generate realistic…
Today's probabilistic language generators fall short when it comes to producing coherent and fluent text despite the fact that the underlying models perform well under standard metrics, e.g., perplexity. This discrepancy has puzzled the…
Application of formal models provides many benefits for the software and system development, however, the learning curve of formal languages could be a critical factor for an industrial project. Thus, a natural language specification that…
We develop an approach to estimate the probability that a program sampled from a large language model is correct. Given a natural language description of a programming problem, our method samples both candidate programs as well as candidate…
Despite their growing capabilities, language models still frequently reproduce content from their training data, generate repetitive text, and favor common grammatical patterns and vocabulary. A possible cause is the decoding strategy: the…