Related papers: Transformer-based Program Synthesis for Low-Data E…

Synthetic Data Generation in Low-Resource Settings via Fine-Tuning of Large Language Models

The in-context learning ability of large language models (LLMs) enables them to generalize to novel downstream tasks with relatively few labeled examples. However, they require enormous computational resources to be deployed. Alternatively,…

Computation and Language · Computer Science 2024-01-09 Jean Kaddour , Qi Liu

Generative Modeling of Networked Time-Series via Transformer Architectures

Many security and network applications require having large datasets to train the machine learning models. Limited data access is a well-known problem in the security domain. Recent studies have shown the potential of Transformer models to…

Machine Learning · Computer Science 2025-06-10 Yusuf Elnady

Data Generation for Neural Programming by Example

Programming by example is the problem of synthesizing a program from a small set of input / output pairs. Recent works applying machine learning methods to this task show promise, but are typically reliant on generating synthetic examples…

Machine Learning · Computer Science 2019-11-11 Judith Clymo , Haik Manukian , Nathanaël Fijalkow , Adrià Gascón , Brooks Paige

Modular Techniques for Synthetic Long-Context Data Generation in Language Model Training and Evaluation

The ability of large language models (LLMs) to process and reason over long textual inputs is critical for a wide range of real-world applications. However, progress in this area is significantly constrained by the absence of high-quality,…

Computation and Language · Computer Science 2025-09-05 Seganrasan Subramanian , Abhigya Verma

Beyond the Training Distribution: Mapping Generalization Boundaries in Neural Program Synthesis

Large-scale transformers achieve impressive results on program synthesis benchmarks, yet their true generalization capabilities remain obscured by data contamination and opaque training corpora. To rigorously assess whether models are truly…

Machine Learning · Computer Science 2026-05-01 Henrik Voigt , Michael Habeck , Joachim Giesen

Meta-Sim: Learning to Generate Synthetic Datasets

Training models to high-end performance requires availability of large labeled datasets, which are expensive to get. The goal of our work is to automatically synthesize labeled datasets that are relevant for a downstream task. We propose…

Computer Vision and Pattern Recognition · Computer Science 2019-04-29 Amlan Kar , Aayush Prakash , Ming-Yu Liu , Eric Cameracci , Justin Yuan , Matt Rusiniak , David Acuna , Antonio Torralba , Sanja Fidler

Synthetic Data Generation Using Large Language Models: Advances in Text and Code

This survey reviews how large language models (LLMs) are transforming synthetic training data generation in both natural language and code domains. By producing artificial but task-relevant examples, these models can significantly augment…

Computation and Language · Computer Science 2025-11-21 Mihai Nadas , Laura Diosan , Andreea Tomescu

Towards an Automatic Optimisation Model Generator Assisted with Generative Pre-trained Transformer

This article presents a framework for generating optimisation models using a pre-trained generative transformer. The framework involves specifying the features that the optimisation model should have and using a language model to generate…

Neural and Evolutionary Computing · Computer Science 2023-05-11 Boris Almonacid

Recent Developments in Program Synthesis with Evolutionary Algorithms

The automatic generation of computer programs is one of the main applications with practical relevance in the field of evolutionary computation. With program synthesis techniques not only software developers could be supported in their…

Neural and Evolutionary Computing · Computer Science 2021-08-30 Dominik Sobania , Dirk Schweim , Franz Rothlauf

Enhancing Robot Program Synthesis Through Environmental Context

Program synthesis aims to automatically generate an executable program that conforms to the given specification. Recent advancements have demonstrated that deep neural methodologies and large-scale pretrained language models are highly…

Robotics · Computer Science 2023-12-14 Tianyi Chen , Qidi Wang , Zhen Dong , Liwei Shen , Xin Peng

Hierarchical Neural Program Synthesis

Program synthesis aims to automatically construct human-readable programs that satisfy given task specifications, such as input/output pairs or demonstrations. Recent works have demonstrated encouraging results in a variety of domains, such…

Software Engineering · Computer Science 2023-03-13 Linghan Zhong , Ryan Lindeborg , Jesse Zhang , Joseph J. Lim , Shao-Hua Sun

Large Language Models are Few-Shot Training Example Generators: A Case Study in Fallacy Recognition

Recognizing fallacies is crucial for ensuring the quality and validity of arguments across various domains. However, computational fallacy recognition faces challenges due to the diverse genres, domains, and types of fallacies found in…

Computation and Language · Computer Science 2024-08-16 Tariq Alhindi , Smaranda Muresan , Preslav Nakov

Program Synthesis via Test-Time Transduction

We introduce transductive program synthesis, a new formulation of the program synthesis task that explicitly leverages test inputs during synthesis. While prior approaches to program synthesis--whether based on natural language descriptions…

Artificial Intelligence · Computer Science 2025-10-22 Kang-il Lee , Jahyun Koo , Seunghyun Yoon , Minbeom Kim , Hyukhun Koh , Dongryeol Lee , Kyomin Jung

Taming Transformers for High-Resolution Image Synthesis

Designed to learn long-range interactions on sequential data, transformers continue to show state-of-the-art results on a wide variety of tasks. In contrast to CNNs, they contain no inductive bias that prioritizes local interactions. This…

Computer Vision and Pattern Recognition · Computer Science 2021-06-24 Patrick Esser , Robin Rombach , Björn Ommer

Better as Generators Than Classifiers: Leveraging LLMs and Synthetic Data for Low-Resource Multilingual Classification

Large Language Models (LLMs) have demonstrated remarkable multilingual capabilities, making them promising tools in both high- and low-resource languages. One particularly valuable use case is generating synthetic samples that can be used…

Computation and Language · Computer Science 2026-01-26 Branislav Pecher , Jan Cegin , Robert Belanec , Ivan Srba , Jakub Simko , Maria Bielikova

On the Reliability and Explainability of Language Models for Program Generation

Recent studies have adopted pre-trained language models, such as CodeT5 and CodeGPT, for automated program generation tasks like code generation, repair, and translation. Numerous language model-based approaches have been proposed and…

Software Engineering · Computer Science 2024-01-09 Yue Liu , Chakkrit Tantithamthavorn , Yonghui Liu , Li Li

Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis

Data synthesis for training large reasoning models offers a scalable alternative to limited, human-curated datasets, enabling the creation of high-quality data. However, existing approaches face several challenges: (i) indiscriminate…

Artificial Intelligence · Computer Science 2026-05-11 Yongxian Wei , Yilin Zhao , Zixuan Hu , Li Shen , Xinrui Chen , Runxi Cheng , Sinan Du , Hao Yu , Chun Yuan , Dian Li

Instruction Data Generation and Unsupervised Adaptation for Speech Language Models

In this paper, we propose three methods for generating synthetic samples to train and evaluate multimodal large language models capable of processing both text and speech inputs. Addressing the scarcity of samples containing both…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-21 Vahid Noroozi , Zhehuai Chen , Somshubra Majumdar , Steve Huang , Jagadeesh Balam , Boris Ginsburg

LLM4TDD: Best Practices for Test Driven Development Using Large Language Models

In today's society, we are becoming increasingly dependent on software systems. However, we also constantly witness the negative impacts of buggy software. Program synthesis aims to improve software correctness by automatically generating…

Software Engineering · Computer Science 2023-12-11 Sanyogita Piya , Allison Sullivan

SynthForge: Synthesizing High-Quality Face Dataset with Controllable 3D Generative Models

Recent advancements in generative models have unlocked the capabilities to render photo-realistic data in a controllable fashion. Trained on the real data, these generative models are capable of producing realistic samples with minimal to…

Computer Vision and Pattern Recognition · Computer Science 2024-06-13 Abhay Rawat , Shubham Dokania , Astitva Srivastava , Shuaib Ahmed , Haiwen Feng , Rahul Tallamraju