Related papers: UniCoder: Scaling Code Large Language Model via Un…

UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance

Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. Current approaches for obtaining high-quality code data primarily focus on (i) collecting large-scale…

Computation and Language · Computer Science 2025-02-18 Yichuan Ma , Yunfan Shao , Peiji Li , Demin Song , Qipeng Guo , Linyang Li , Xipeng Qiu , Kai Chen

UniCode: Augmenting Evaluation for Code Reasoning

Current coding benchmarks often inflate Large Language Model (LLM) capabilities due to static paradigms and data contamination, enabling models to exploit statistical shortcuts rather than genuine reasoning. To address this, we introduce…

Software Engineering · Computer Science 2026-02-17 Xinyue Zheng , Haowei Lin , Shaofei Cai , Zilong Zheng , Yaodong Yang , Yitao Liang

TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation

Large language models (LLMs) have shown remarkable ability to generate code, yet their outputs often violate syntactic or semantic constraints when guided only through natural language prompts. We introduce TreeCoder, the most general and…

Machine Learning · Computer Science 2026-04-27 Henrijs Princis , Arindam Sharma , Cristina David

Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models

Large language models (LLMs) have scaled up to unlock a wide range of complex reasoning tasks with the aid of various prompting methods. However, current prompting methods generate natural language intermediate steps to help reasoning,…

Computation and Language · Computer Science 2023-10-10 Yi Hu , Haotong Yang , Zhouchen Lin , Muhan Zhang

AceCoder: Utilizing Existing Code to Enhance Code Generation

Large Language Models (LLMs) have shown great success in code generation. LLMs take as the input a prompt and output the code. A key question is how to make prompts (i.e., Prompting Techniques). Existing prompting techniques are designed…

Software Engineering · Computer Science 2023-09-08 Jia Li , Yunfei Zhao , Yongmin Li , Ge Li , Zhi Jin

VisualCoder: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning

Predicting program behavior and reasoning about code execution remain significant challenges in software engineering, particularly for large language models (LLMs) designed for code analysis. While these models excel at understanding static…

Software Engineering · Computer Science 2025-02-11 Cuong Chi Le , Hoang-Chau Truong-Vinh , Huy Nhat Phan , Dung Duy Le , Tien N. Nguyen , Nghi D. Q. Bui

Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning

Despite the remarkable success of large language models (LLMs) on traditional natural language processing tasks, their planning ability remains a critical bottleneck in tackling complex multi-step reasoning tasks. Existing approaches mainly…

Computation and Language · Computer Science 2024-10-07 Jiaxin Wen , Jian Guan , Hongning Wang , Wei Wu , Minlie Huang

ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models

Tool learning has emerged as a crucial capability for large language models (LLMs) to solve complex real-world tasks through interaction with external tools. Existing approaches face significant challenges, including reliance on…

Computation and Language · Computer Science 2025-06-02 Hanxing Ding , Shuchang Tao , Liang Pang , Zihao Wei , Jinyang Gao , Bolin Ding , Huawei Shen , Xueqi Cheng

Assessing Code Generation with Intermediate Languages

Intermediate step methodologies like chain of thoughts (COT) have demonstrated effectiveness in enhancing the performance of Large Language Models (LLMs) on code generation. This study explores the utilization of intermediate languages,…

Software Engineering · Computer Science 2024-07-09 Xun Deng , Sicheng Zhong , Honghua Dong , Jingyu Hu , Sidi Mohamed Beillahi , Xujie Si , Fan Long

KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction

In this paper, we propose KnowCoder, a Large Language Model (LLM) to conduct Universal Information Extraction (UIE) via code generation. KnowCoder aims to develop a kind of unified schema representation that LLMs can easily understand and…

Machine Learning · Computer Science 2024-03-15 Zixuan Li , Yutao Zeng , Yuxin Zuo , Weicheng Ren , Wenxuan Liu , Miao Su , Yucan Guo , Yantao Liu , Xiang Li , Zhilei Hu , Long Bai , Wei Li , Yidan Liu , Pan Yang , Xiaolong Jin , Jiafeng Guo , Xueqi Cheng

Preventing Language Models From Hiding Their Reasoning

Large language models (LLMs) often benefit from intermediate steps of reasoning to generate answers to complex problems. When these intermediate steps of reasoning are used to monitor the activity of the model, it is essential that this…

Machine Learning · Computer Science 2023-11-02 Fabien Roger , Ryan Greenblatt

SimulatorCoder: DNN Accelerator Simulator Code Generation and Optimization via Large Language Models

This paper presents SimulatorCoder, an agent powered by large language models (LLMs), designed to generate and optimize deep neural network (DNN) accelerator simulators based on natural language descriptions. By integrating domain-specific…

Hardware Architecture · Computer Science 2026-02-20 Yuhuan Xia , Tun Li , Hongji Zhou , Xianfa Zhou , Chong Chen , Ruiyu Zhang

CoLadder: Supporting Programmers with Hierarchical Code Generation in Multi-Level Abstraction

Programmers increasingly rely on Large Language Models (LLMs) for code generation. However, misalignment between programmers' goals and generated code complicates the code evaluation process and demands frequent switching between prompt…

Software Engineering · Computer Science 2023-12-27 Ryan Yen , Jiawen Zhu , Sangho Suh , Haijun Xia , Jian Zhao

LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation

Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, given NL is informal, it does not lend easily to checking…

Software Engineering · Computer Science 2024-10-04 Sarah Fakhoury , Aaditya Naik , Georgios Sakkas , Saikat Chakraborty , Shuvendu K. Lahiri

Collaboration is all you need: LLM Assisted Safe Code Translation

This paper introduces UniTranslator, a visionary framework that re-imagines code translation as a collaborative endeavor among multiple, compact LLMs. By orchestrating the interaction of specialized agents, each focused on different aspects…

Artificial Intelligence · Computer Science 2025-08-01 Rabimba Karanjai , Sam Blackshear , Lei Xu , Weidong Shi

UniCoR: Modality Collaboration for Robust Cross-Language Hybrid Code Retrieval

Effective code retrieval is indispensable and it has become an important paradigm to search code in hybrid mode using both natural language and code snippets. Nevertheless, it remains unclear whether existing approaches can effectively…

Software Engineering · Computer Science 2026-03-09 Yang Yang , Li Kuang , Jiakun Liu , Zhongxin Liu , Yingjie Xia , David Lo

Seed-Coder: Let the Code Model Curate Data for Itself

Code data in large language model (LLM) pretraining is recognized crucial not only for code-related tasks but also for enhancing general intelligence of LLMs. Current open-source LLMs often heavily rely on human effort to produce their code…

Computation and Language · Computer Science 2025-06-06 ByteDance Seed , Yuyu Zhang , Jing Su , Yifan Sun , Chenguang Xi , Xia Xiao , Shen Zheng , Anxiang Zhang , Kaibo Liu , Daoguang Zan , Tao Sun , Jinhua Zhu , Shulin Xin , Dong Huang , Yetao Bai , Lixin Dong , Chao Li , Jianchong Chen , Hanzhi Zhou , Yifan Huang , Guanghan Ning , Xierui Song , Jiaze Chen , Siyao Liu , Kai Shen , Liang Xiang , Yonghui Wu

TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills

Code pre-trained models (CodePTMs) have recently demonstrated a solid capacity to process various software intelligence tasks, e.g., code clone detection, code translation, and code summarization. The current mainstream method that deploys…

Software Engineering · Computer Science 2024-05-10 Qiushi Sun , Nuo Chen , Jianing Wang , Xiang Li , Ming Gao

UniCode: Learning a Unified Codebook for Multimodal Large Language Models

In this paper, we propose \textbf{UniCode}, a novel approach within the domain of multimodal large language models (MLLMs) that learns a unified codebook to efficiently tokenize visual, text, and potentially other types of signals. This…

Computer Vision and Pattern Recognition · Computer Science 2024-03-15 Sipeng Zheng , Bohan Zhou , Yicheng Feng , Ye Wang , Zongqing Lu

GCoder: Improving Large Language Model for Generalized Graph Problem Solving

Large Language Models (LLMs) have demonstrated strong reasoning abilities, making them suitable for complex tasks such as graph computation. Traditional reasoning steps paradigm for graph problems is hindered by unverifiable steps, limited…

Computation and Language · Computer Science 2024-10-28 Qifan Zhang , Xiaobin Hong , Jianheng Tang , Nuo Chen , Yuhan Li , Wenzhong Li , Jing Tang , Jia Li