Related papers: KnowCoder: Coding Structured Knowledge into LLMs f…

KnowCoder-X: Boosting Multilingual Information Extraction via Code

Empirical evidence indicates that LLMs exhibit spontaneous cross-lingual alignment. However, although LLMs show promising cross-lingual alignment in Information Extraction (IE), a significant imbalance across languages persists,…

Computation and Language · Computer Science 2025-06-03 Yuxin Zuo , Wenxuan Jiang , Wenxuan Liu , Zixuan Li , Long Bai , Hanbin Wang , Yutao Zeng , Xiaolong Jin , Jiafeng Guo , Xueqi Cheng

Retrieval-Augmented Code Generation for Universal Information Extraction

Information Extraction (IE) aims to extract structural knowledge (e.g., entities, relations, events) from natural language texts, which brings challenges to existing methods due to task-specific schemas and complex text expressions. Code,…

Artificial Intelligence · Computer Science 2023-11-07 Yucan Guo , Zixuan Li , Xiaolong Jin , Yantao Liu , Yutao Zeng , Wenxuan Liu , Xiang Li , Pan Yang , Long Bai , Jiafeng Guo , Xueqi Cheng

UniCoder: Scaling Code Large Language Model via Universal Code

Intermediate reasoning or acting steps have successfully improved large language models (LLMs) for handling various downstream natural language processing (NLP) tasks. When applying LLMs for code generation, recent works mainly focus on…

Computation and Language · Computer Science 2024-06-25 Tao Sun , Linzheng Chai , Jian Yang , Yuwei Yin , Hongcheng Guo , Jiaheng Liu , Bing Wang , Liqun Yang , Zhoujun Li

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems. While open-access code LLMs are increasingly approaching the performance levels of proprietary…

Computation and Language · Computer Science 2025-03-21 Siming Huang , Tianhao Cheng , J. K. Liu , Jiaran Hao , Liuyihan Song , Yang Xu , J. Yang , Jiaheng Liu , Chenchen Zhang , Linzheng Chai , Ruifeng Yuan , Zhaoxiang Zhang , Jie Fu , Qian Liu , Ge Zhang , Zili Wang , Yuan Qi , Yinghui Xu , Wei Chu

UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance

Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. Current approaches for obtaining high-quality code data primarily focus on (i) collecting large-scale…

Computation and Language · Computer Science 2025-02-18 Yichuan Ma , Yunfan Shao , Peiji Li , Demin Song , Qipeng Guo , Linyang Li , Xipeng Qiu , Kai Chen

Code-MIE: A Code-style Model for Multimodal Information Extraction with Scene Graph and Entity Attribute Knowledge Enhancement

With the rapid development of large language models (LLMs), more and more researchers have paid attention to information extraction based on LLMs. However, there are still some spaces to improve in the existing related methods. First,…

Computation and Language · Computer Science 2026-03-24 Jiang Liu , Ge Qiu , Hao Fei , Dongdong Xie , Jinbo Li , Fei Li , Chong Teng , Donghong Ji

CodeIE: Large Code Generation Models are Better Few-Shot Information Extractors

Large language models (LLMs) pre-trained on massive corpora have demonstrated impressive few-shot learning ability on many NLP tasks. A common practice is to recast the task into a text-to-text format such that generative LLMs of natural…

Computation and Language · Computer Science 2023-05-12 Peng Li , Tianxiang Sun , Qiong Tang , Hang Yan , Yuanbin Wu , Xuanjing Huang , Xipeng Qiu

SchemaCoder: Automatic Log Schema Extraction Coder with Residual Q-Tree Boosting

Log schema extraction is the process of deriving human-readable templates from massive volumes of log data, which is essential yet notoriously labor-intensive. Recent studies have attempted to streamline this task by leveraging Large…

Artificial Intelligence · Computer Science 2025-08-27 Lily Jiaxin Wan , Chia-Tung Ho , Rongjian Liang , Cunxi Yu , Deming Chen , Haoxing Ren

ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration

Large language models face intrinsic limitations in coding with APIs that are unseen in their training corpora. As libraries continuously evolve, it becomes impractical to exhaustively retrain LLMs with new API knowledge. This limitation…

Software Engineering · Computer Science 2025-06-23 Yunkun Wang , Yue Zhang , Zhen Qin , Chen Zhi , Binhua Li , Fei Huang , Yongbin Li , Shuiguang Deng

TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation

Large language models (LLMs) have shown remarkable ability to generate code, yet their outputs often violate syntactic or semantic constraints when guided only through natural language prompts. We introduce TreeCoder, the most general and…

Machine Learning · Computer Science 2026-04-27 Henrijs Princis , Arindam Sharma , Cristina David

WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning

Recent work demonstrates that, after instruction tuning, Code Large Language Models (Code LLMs) can obtain impressive capabilities to address a wide range of code-related tasks. However, current instruction tuning methods for Code LLMs…

Computation and Language · Computer Science 2024-06-10 Zhaojian Yu , Xin Zhang , Ning Shang , Yangyu Huang , Can Xu , Yishujie Zhao , Wenxiang Hu , Qiufeng Yin

ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models

Tool learning has emerged as a crucial capability for large language models (LLMs) to solve complex real-world tasks through interaction with external tools. Existing approaches face significant challenges, including reliance on…

Computation and Language · Computer Science 2025-06-02 Hanxing Ding , Shuchang Tao , Liang Pang , Zihao Wei , Jinyang Gao , Bolin Ding , Huawei Shen , Xueqi Cheng

Can Code Language Models Learn Clarification-Seeking Behaviors?

Large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, a gap remains between their output and the problem-solving strategies of human developers. Unlike humans, who spend substantial time…

Software Engineering · Computer Science 2025-09-29 Jie JW Wu , Manav Chaudhary , Davit Abrahamyan , Arhaan Khaku , Anjiang Wei , Fatemeh H. Fard

To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

Large Language Models (LLMs) have shown strong potential for code generation, yet they remain limited in private-library-oriented code generation, where the goal is to generate code using APIs from private libraries. Existing approaches…

Software Engineering · Computer Science 2026-03-30 Yitong Zhang , Chengze Li , Ruize Chen , Guowei Yang , Xiaoran Jia , Yijie Ren , Jia Li

ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation

Code generation tasks aim to automate the conversion of user requirements into executable code, significantly reducing manual development efforts and enhancing software productivity. The emergence of large language models (LLMs) has…

Software Engineering · Computer Science 2026-01-15 Sicong Liu , Yanxian Huang , Mingwei Liu , Jiachi Chen , Ensheng Shi , Yuchi Ma , Hongyu Zhang , Yin Zhang , Yanlin Wang

Seed-Coder: Let the Code Model Curate Data for Itself

Code data in large language model (LLM) pretraining is recognized crucial not only for code-related tasks but also for enhancing general intelligence of LLMs. Current open-source LLMs often heavily rely on human effort to produce their code…

Computation and Language · Computer Science 2025-06-06 ByteDance Seed , Yuyu Zhang , Jing Su , Yifan Sun , Chenguang Xi , Xia Xiao , Shen Zheng , Anxiang Zhang , Kaibo Liu , Daoguang Zan , Tao Sun , Jinhua Zhu , Shulin Xin , Dong Huang , Yetao Bai , Lixin Dong , Chao Li , Jianchong Chen , Hanzhi Zhou , Yifan Huang , Guanghan Ning , Xierui Song , Jiaze Chen , Siyao Liu , Kai Shen , Liang Xiang , Yonghui Wu

ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation

Code translation is a crucial activity in the software development and maintenance process, and researchers have recently begun to focus on using pre-trained large language models (LLMs) for code translation. However, existing LLMs only…

Software Engineering · Computer Science 2025-09-30 Minghua He , Yue Chen , Fangkai Yang , Pu Zhao , Wenjie Yin , Yu Kang , Qingwei Lin , Saravan Rajmohan , Dongmei Zhang

ConceptCoder: Improve Code Reasoning via Concept Learning

Large language models (LLMs) have shown promising results for software engineering applications, but still struggle with code reasoning tasks such as vulnerability detection (VD). We introduce ConceptCoder, a fine-tuning method that…

Software Engineering · Computer Science 2026-03-25 Md Mahbubur Rahman , Hengbo Tong , Wei Le

StoryCoder: Narrative Reformulation for Structured Reasoning in LLM Code Generation

Effective code generation requires both model capability and a problem representation that carefully structures how models reason and plan. Existing approaches augment reasoning steps or inject specific structure into how models think, but…

Computation and Language · Computer Science 2026-04-17 Geonhui Jang , Dongyoon Han , YoungJoon Yoo

MLLM-Based UI2Code Automation Guided by UI Layout Information

Converting user interfaces into code (UI2Code) is a crucial step in website development, which is time-consuming and labor-intensive. The automation of UI2Code is essential to streamline this task, beneficial for improving the development…

Software Engineering · Computer Science 2025-06-13 Fan Wu , Cuiyun Gao , Shuqing Li , Xin-Cheng Wen , Qing Liao