Related papers: Automating Database-Native Function Code Synthesis…

SynthCoder: A Synthetical Strategy to Tune LLMs for Code Completion

Code completion is a prominent application of Large Language Models (LLMs) in software engineering. Due to the near real-time response requirements of this task, base models with small to medium-sized parameters are typically employed,…

Software Engineering · Computer Science 2025-09-18 Dongjun Yu , Xiao Yan , Zhenrui Li , Jipeng Xiao , Haochuan He , Yongda Yu , Hao Zhang , Guoping Rong , Xiaobo Huang

Type- and Content-Driven Synthesis of SQL Queries from Natural Language

This paper presents a new technique for automatically synthesizing SQL queries from natural language. Our technique is fully automated, works for any database without requiring additional customization, and does not require users to know…

Databases · Computer Science 2017-02-07 Navid Yaghmazadeh , Yuepeng Wang , Isil Dillig , Thomas Dillig

DeepCode: Open Agentic Coding

Recent advances in large language models (LLMs) have given rise to powerful coding agents, making it possible for code assistants to evolve into code engineers. However, existing methods still face significant challenges in achieving…

Software Engineering · Computer Science 2025-12-10 Zongwei Li , Zhonghang Li , Zirui Guo , Xubin Ren , Chao Huang

Bespoke OLAP: Synthesizing Workload-Specific One-size-fits-one Database Engines

Modern OLAP engines are designed to support arbitrary analytical workloads, but this generality incurs structural overhead, including runtime schema interpretation, indirection layers, and abstraction boundaries, even in highly optimized…

Databases · Computer Science 2026-03-03 Johannes Wehrstein , Timo Eckmann , Matthias Jasny , Carsten Binnig

MapCoder: Multi-Agent Code Generation for Competitive Problem Solving

Code synthesis, which requires a deep understanding of complex natural language problem descriptions, generation of code instructions for complex algorithms and data structures, and the successful execution of comprehensive unit tests,…

Computation and Language · Computer Science 2024-05-21 Md. Ashraful Islam , Mohammed Eunus Ali , Md Rizwan Parvez

CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis

Inductive program synthesis, or programming by example, requires synthesizing functions from input-output examples that generalize to unseen inputs. While large language model agents have shown promise in programming tasks guided by natural…

Programming Languages · Computer Science 2025-08-11 Anjiang Wei , Tarun Suresh , Jiannan Cao , Naveen Kannan , Yuheng Wu , Kai Yan , Thiago S. F. X. Teixeira , Ke Wang , Alex Aiken

Function-constrained Program Synthesis

This work introduces (1) a technique that allows large language models (LLMs) to leverage user-provided code when solving programming tasks and (2) a method to iteratively generate modular sub-functions that can aid future code generation…

Machine Learning · Computer Science 2023-12-05 Patrick Hajali , Ignas Budvytis

An LLM Compiler for Parallel Function Calling

The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has…

Computation and Language · Computer Science 2024-06-06 Sehoon Kim , Suhong Moon , Ryan Tabrizi , Nicholas Lee , Michael W. Mahoney , Kurt Keutzer , Amir Gholami

LLM-Guided Compositional Program Synthesis

Program synthesis from input-output examples, also called programming by example (PBE), has had tremendous impact on automating end-user tasks. Large language models (LLMs) have the ability to solve PBE tasks by generating code in different…

Programming Languages · Computer Science 2025-03-21 Ruhma Khan , Sumit Gulwani , Vu Le , Arjun Radhakrishna , Ashish Tiwari , Gust Verbruggen

Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks

Large language models (LLMs) have shown impressive promise in code generation, yet their progress remains limited by the shortage of large-scale datasets that are both diverse and well-aligned with human reasoning. Most existing resources…

Machine Learning · Computer Science 2025-10-28 Amal Abed , Ivan Lukic , Jörg K. H. Franke , Frank Hutter

Synthesizing Database Programs for Schema Refactoring

Many programs that interact with a database need to undergo schema refactoring several times during their life cycle. Since this process typically requires making significant changes to the program's implementation, schema refactoring is…

Programming Languages · Computer Science 2019-04-12 Yuepeng Wang , James Dong , Rushi Shah , Isil Dillig

Code Evolution for Control: Synthesizing Policies via LLM-Driven Evolutionary Search

Designing effective control policies for autonomous systems remains a fundamental challenge, traditionally addressed through reinforcement learning or manual engineering. While reinforcement learning has achieved remarkable success, it…

Artificial Intelligence · Computer Science 2026-01-13 Ping Guo , Chao Li , Yinglan Feng , Chaoning Zhang

Functional Consistency of LLM Code Embeddings: A Self-Evolving Data Synthesis Framework for Benchmarking

Embedding models have demonstrated strong performance in tasks like clustering, retrieval, and feature extraction while offering computational advantages over generative models and cross-encoders. Benchmarks such as MTEB have shown that…

Software Engineering · Computer Science 2025-08-28 Zhuohao Li , Wenqing Chen , Jianxing Yu , Zhichao Lu

AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data

Open-source Large Language Models (LLMs) and their specialized variants, particularly Code LLMs, have recently delivered impressive performance. However, previous Code LLMs are typically fine-tuned on single-source data with limited quality…

Computation and Language · Computer Science 2025-02-04 Zifan Song , Yudong Wang , Wenwei Zhang , Kuikun Liu , Chengqi Lyu , Demin Song , Qipeng Guo , Hang Yan , Dahua Lin , Kai Chen , Cairong Zhao

Code Simulation as a Proxy for High-order Tasks in Large Language Models

Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. We collect pairs of naturalistic and synthetic reasoning tasks to…

Machine Learning · Computer Science 2025-07-08 Emanuele La Malfa , Christoph Weinhuber , Orazio Torre , Fangru Lin , X. Angelo Huang , Samuele Marro , Anthony Cohn , Nigel Shadbolt , Michael Wooldridge

Enhancing Accuracy and Maintainability in Nuclear Plant Data Retrieval: A Function-Calling LLM Approach Over NL-to-SQL

Retrieving operational data from nuclear power plants requires exceptional accuracy and transparency due to the criticality of the decisions it supports. Traditionally, natural language to SQL (NL-to-SQL) approaches have been explored for…

Computation and Language · Computer Science 2025-06-11 Mishca de Costa , Muhammad Anwar , Dave Mercier , Mark Randall , Issam Hammad

Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases

Large Language Models (LLMs) have demonstrated remarkable performance in code completion. However, due to the lack of domain-specific knowledge, they may not be optimal in completing code that requires intensive domain knowledge for example…

Software Engineering · Computer Science 2023-09-21 Ze Tang , Jidong Ge , Shangqing Liu , Tingwei Zhu , Tongtong Xu , Liguo Huang , Bin Luo

Automatic Data Transformation Using Large Language Model: An Experimental Study on Building Energy Data

Existing approaches to automatic data transformation are insufficient to meet the requirements in many real-world scenarios, such as the building sector. First, there is no convenient interface for domain experts to provide domain knowledge…

Databases · Computer Science 2023-10-31 Ankita Sharma , Xuanmao Li , Hong Guan , Guoxin Sun , Liang Zhang , Lanjun Wang , Kesheng Wu , Lei Cao , Erkang Zhu , Alexander Sim , Teresa Wu , Jia Zou

Write Your Own CodeChecker: An Automated Test-Driven Checker Development Approach with LLMs

With the rising demand for code quality assurance, developers are not only utilizing existing static code checkers but also seeking custom checkers to satisfy their specific needs. Nowadays, various code-checking frameworks provide…

Software Engineering · Computer Science 2025-07-18 Jun Liu , Yuanyuan Xie , Jiwei Yan , Jinhao Huang , Jun Yan , Jian Zhang

GenDB: The Next Generation of Query Processing -- Synthesized, Not Engineered

Traditional query processing relies on engines that are carefully optimized and engineered by many experts. However, new techniques and user requirements evolve rapidly, and existing systems often cannot keep pace. At the same time, these…

Databases · Computer Science 2026-03-03 Jiale Lao , Immanuel Trummer