English
Related papers

Related papers: Commit0: Library Generation from Scratch

200 papers

Large language model (LLM)-based coding agents achieve impressive results on controlled benchmarks yet routinely produce pull requests that real maintainers reject. The root cause is not functional incorrectness but a lack of organicity:…

Software Engineering · Computer Science 2026-03-30 Mo Li , L. H. Xu , Qitai Tan , Ting Cao , Yunxin Liu

Large Language Models (LLMs) are driving a shift towards intent-driven development, where agents build complete software from scratch. However, existing benchmarks fail to assess this 0-to-1 generation capability due to two limitations:…

Software Engineering · Computer Science 2026-04-09 Ruida Hu , Xinchen Wang , Chao Peng , Cuiyun Gao , David Lo

Turning ideas into full software projects from scratch has become a popular use case for language models. Agents are being deployed to seed, maintain, and grow codebases over extended periods with minimal human oversight. Such settings…

Recent progress in autonomous code generation has fueled excitement around AI agents capable of accelerating scientific discovery by running experiments. However, there is currently no benchmark that evaluates whether such agents can…

Artificial Intelligence · Computer Science 2025-06-25 Gyeongwon James Kim , Alex Wilf , Louis-Philippe Morency , Daniel Fried

Large Language Model (LLM) Agents, often trained with Reinforcement Learning (RL), are constrained by a dependency on human-curated data, limiting scalability and tethering AI to human knowledge. Existing self-evolution frameworks offer an…

Machine Learning · Computer Science 2025-11-21 Peng Xia , Kaide Zeng , Jiaqi Liu , Can Qin , Fang Wu , Yiyang Zhou , Caiming Xiong , Huaxiu Yao

Writing commit messages is a tedious daily task for many software developers, and often remains neglected. Automating this task has the potential to save time while ensuring that messages are informative. A high-quality dataset and an…

Computation and Language · Computer Science 2024-03-11 Maximilian Schall , Tamara Czinczoll , Gerard de Melo

Large Language Models (LLMs) have recently shown remarkable progress in code generation, yet their ability to construct complete software repositories from scratch remains poorly understood. A fundamental bottleneck is the lack of…

Software Engineering · Computer Science 2026-05-21 Zhaoxi Zhang , Yiming Xu , Jiahui Liang , Weikang Li , Xiaoshuai Chen , Liwei Qian , Xin Pei , Jizhou Huang , Run Sun , Yunfang Wu

We present app.build (https://github.com/neondatabase/appdotbuild-agent), an open-source framework that improves LLM-based application generation through systematic validation and structured environments. Our approach combines multi-layered…

Artificial Intelligence · Computer Science 2026-01-13 Evgenii Kniazev , Arseny Kravchenko , Igor Rekun , James Broadhead , Nikita Shamgunov , Pranav Sah , Pratik Nichite , Ivan Yamshchikov

Large language models are redefining software engineering by implementing AI-powered techniques throughout the whole software development process, including requirement gathering, software architecture, code generation, testing, and…

Software Engineering · Computer Science 2024-06-11 Malik Abdul Sami , Muhammad Waseem , Zeeshan Rasheed , Mika Saari , Kari Systä , Pekka Abrahamsson

Current approaches rely on zero-shot evaluation due to the absence of training data; while proprietary models such as GPT-4 exhibit strong reasoning capabilities, smaller open-source models remain ineffective at complex tool use. To address…

Artificial Intelligence · Computer Science 2026-05-05 Hyunji Min , Sangwon Jung , Junyoung Sung , Dosung Lee , Leekyeung Han , Paul Hongsuck Seo

Agentic AI systems can now generate code with remarkable fluency, but a fundamental question remains: \emph{does the generated code actually do what the user intended?} The gap between informal natural language requirements and precise…

Software Engineering · Computer Science 2026-03-19 Shuvendu K. Lahiri

The rapid adoption of AI agents across domains has made systematic evaluation crucial for ensuring their usefulness and successful production deployment. Evaluation of AI agents typically involves using a fixed set of benchmarks and…

Static "human data" faces inherent limitations: it is expensive to scale and bounded by the knowledge of its creators. Continuous learning from "experience data" - interactions between agents and their environments - promises to transcend…

The transition from human-centric to agent-centric software development practices is disrupting existing knowledge sharing environments for software developers. Traditional peer-to-peer repositories and developer communities for shared…

Artificial Intelligence · Computer Science 2025-11-12 Valentin Tablan , Scott Taylor , Gabriel Hurtado , Kristoffer Bernhem , Anders Uhrenholt , Gabriele Farei , Karo Moilanen

Software code generation using Large Language Models (LLMs) is one of the most successful applications of modern artificial intelligence. Foundational models are very effective for popular frameworks that benefit from documentation,…

Software Engineering · Computer Science 2025-10-01 Dmitriy Kostunin , Vladimir Sotnikov , Sergo Golovachev , Abhay Mehta , Tim Lukas Holch , Elisa Jones

The rapid evolution of software libraries poses a considerable hurdle for code generation, necessitating continuous adaptation to frequent version updates while preserving backward compatibility. While existing code evolution benchmarks…

While large language models have significantly accelerated scientific code generation, comprehensively evaluating the generated code remains a major challenge. Traditional benchmarks reduce evaluation to test-case matching, an approach…

Artificial Intelligence · Computer Science 2026-03-18 Hong Zhang , Barry Smith , Satish Balay , Le Chen , Murat Keceli , Lois Curfman McInnes , Junchao Zhang

Large language models (LLMs) are increasingly deployed as agents, expected to decompose goals, invoke tools, and verify results in dynamic environments. Realizing these capabilities requires access to agentic data-structured interaction…

Artificial Intelligence · Computer Science 2025-10-22 Abhigya Verma , Seganrasan Subramanian , Nandhakumar Kandasamy , Naman Gupta

Modern AI agents optimize programs by refactoring source code to trigger trusted compiler transformations. This preserves program semantics and reduces source code pollution, making the program easier to maintain and portable across…

Programming Languages · Computer Science 2026-04-16 Akash Deo , Simone Campanoni , Tommy McMichen

Writing compelling fiction is a multifaceted process combining elements such as crafting a plot, developing interesting characters, and using evocative language. While large language models (LLMs) show promise for story writing, they…

Computation and Language · Computer Science 2025-03-17 Fantine Huot , Reinald Kim Amplayo , Jennimaria Palomaki , Alice Shoshana Jakobovits , Elizabeth Clark , Mirella Lapata
‹ Prev 1 2 3 10 Next ›