Related papers: KnowCoder-X: Boosting Multilingual Information Ext…

KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction

In this paper, we propose KnowCoder, a Large Language Model (LLM) to conduct Universal Information Extraction (UIE) via code generation. KnowCoder aims to develop a kind of unified schema representation that LLMs can easily understand and…

Machine Learning · Computer Science 2024-03-15 Zixuan Li , Yutao Zeng , Yuxin Zuo , Weicheng Ren , Wenxuan Liu , Miao Su , Yucan Guo , Yantao Liu , Xiang Li , Zhilei Hu , Long Bai , Wei Li , Yidan Liu , Pan Yang , Xiaolong Jin , Jiafeng Guo , Xueqi Cheng

ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation

Code translation is a crucial activity in the software development and maintenance process, and researchers have recently begun to focus on using pre-trained large language models (LLMs) for code translation. However, existing LLMs only…

Software Engineering · Computer Science 2025-09-30 Minghua He , Yue Chen , Fangkai Yang , Pu Zhao , Wenjie Yin , Yu Kang , Qingwei Lin , Saravan Rajmohan , Dongmei Zhang

xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval

Recently, pre-trained large language models (LLMs) have shown impressive abilities in generating codes from natural language descriptions, repairing buggy codes, translating codes between languages, and retrieving relevant code segments.…

Computation and Language · Computer Science 2023-11-07 Mohammad Abdullah Matin Khan , M Saiful Bari , Xuan Long Do , Weishi Wang , Md Rizwan Parvez , Shafiq Joty

Multi-Agent Collaboration for Multilingual Code Instruction Tuning

Recent advancement in code understanding and generation demonstrates that code LLMs fine-tuned on a high-quality instruction dataset can gain powerful capabilities to address wide-ranging code-related tasks. However, most previous existing…

Computation and Language · Computer Science 2025-02-12 Jian Yang , Wei Zhang , Jiaxi Yang , Yibo Miao , Shanghaoran Quan , Zhenhe Wu , Qiyao Peng , Liqun Yang , Tianyu Liu , Zeyu Cui , Binyuan Hui , Junyang Lin

IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators

Code understanding and generation have fast become some of the most popular applications of language models (LMs). Nonetheless, research on multilingual aspects of Code-LMs (i.e., LMs for code generation) such as cross-lingual transfer…

Artificial Intelligence · Computer Science 2024-04-16 Indraneil Paul , Goran Glavaš , Iryna Gurevych

ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

Software engineers working with the same programming language (PL) may speak different natural languages (NLs) and vice versa, erecting huge barriers to communication and working efficiency. Recent studies have demonstrated the…

Computation and Language · Computer Science 2023-05-22 Yekun Chai , Shuohuan Wang , Chao Pang , Yu Sun , Hao Tian , Hua Wu

InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation

Code translation aims to convert a program from one programming language (PL) to another. This long-standing software engineering task is crucial for modernizing legacy systems, ensuring cross-platform compatibility, enhancing performance,…

Software Engineering · Computer Science 2024-11-06 Marcos Macedo , Yuan Tian , Pengyu Nie , Filipe R. Cogo , Bram Adams

Benchmarking Large Language Models with Augmented Instructions for Fine-grained Information Extraction

Information Extraction (IE) is an essential task in Natural Language Processing. Traditional methods have relied on coarse-grained extraction with simple instructions. However, with the emergence of Large Language Models (LLMs), there is a…

Computation and Language · Computer Science 2023-10-10 Jun Gao , Huan Zhao , Yice Zhang , Wei Wang , Changlong Yu , Ruifeng Xu

Unraveling the Potential of Large Language Models in Code Translation: How Far Are We?

While large language models (LLMs) exhibit state-of-the-art performance in various tasks, recent studies have revealed their struggle for code translation. This is because they haven't been extensively pre-trained with parallel multilingual…

Software Engineering · Computer Science 2024-10-15 Qingxiao Tao , Tingrui Yu , Xiaodong Gu , Beijun Shen

Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality

Large language models (LLMs) exhibit strong general intelligence, yet their multilingual performance remains highly imbalanced. Although LLMs encode substantial cross-lingual knowledge in a unified semantic space, they often struggle to…

Computation and Language · Computer Science 2026-04-17 Mengyu Bu , Yang Feng

Edit Once, Update Everywhere: A Simple Framework for Cross-Lingual Knowledge Synchronization in LLMs

Knowledge editing allows for efficient adaptation of large language models (LLMs) to new information or corrections without requiring full retraining. However, prior methods typically focus on either single-language editing or basic…

Computation and Language · Computer Science 2025-05-26 Yuchen Wu , Liang Ding , Li Shen , Dacheng Tao

Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

Large language models (LLMs) demonstrate impressive multilingual capability, but their performance varies substantially across different languages. In this work, we introduce a simple yet effective method, called cross-lingual-thought…

Computation and Language · Computer Science 2023-10-24 Haoyang Huang , Tianyi Tang , Dongdong Zhang , Wayne Xin Zhao , Ting Song , Yan Xia , Furu Wei

AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion

Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation…

Software Engineering · Computer Science 2026-01-28 Tianyue Jiang , Yanli Wang , Yanlin Wang , Daya Guo , Ensheng Shi , Yuchi Ma , Jiachi Chen , Zibin Zheng

XLM-K: Improving Cross-Lingual Language Model Pre-training with Multilingual Knowledge

Cross-lingual pre-training has achieved great successes using monolingual and bilingual plain text corpora. However, most pre-trained models neglect multilingual knowledge, which is language agnostic but comprises abundant cross-lingual…

Computation and Language · Computer Science 2022-04-26 Xiaoze Jiang , Yaobo Liang , Weizhu Chen , Nan Duan

AlignX: Advancing Multilingual Large Language Models with Multilingual Representation Alignment

Multilingual large language models (LLMs) possess impressive multilingual understanding and generation capabilities. However, their performance and cross-lingual alignment often lag for non-dominant languages. A common solution is to…

Computation and Language · Computer Science 2025-09-30 Mengyu Bu , Shaolei Zhang , Zhongjun He , Hua Wu , Yang Feng

AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data

Open-source Large Language Models (LLMs) and their specialized variants, particularly Code LLMs, have recently delivered impressive performance. However, previous Code LLMs are typically fine-tuned on single-source data with limited quality…

Computation and Language · Computer Science 2025-02-04 Zifan Song , Yudong Wang , Wenwei Zhang , Kuikun Liu , Chengqi Lyu , Demin Song , Qipeng Guo , Hang Yan , Dahua Lin , Kai Chen , Cairong Zhao

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

The recently released GPT-4 Code Interpreter has demonstrated remarkable proficiency in solving challenging math problems, primarily attributed to its ability to seamlessly reason with natural language, generate code, execute code, and…

Computation and Language · Computer Science 2023-10-06 Ke Wang , Houxing Ren , Aojun Zhou , Zimu Lu , Sichun Luo , Weikang Shi , Renrui Zhang , Linqi Song , Mingjie Zhan , Hongsheng Li

Evaluating Multilingual and Code-Switched Alignment in LLMs via Synthetic Natural Language Inference

Large language models (LLMs) are increasingly applied in multilingual contexts, yet their capacity for consistent, logically grounded alignment across languages remains underexplored. We present a controlled evaluation framework for…

Computation and Language · Computer Science 2025-08-21 Samir Abdaljalil , Erchin Serpedin , Khalid Qaraqe , Hasan Kurban

MonoCoder: Domain-Specific Code Language Model for HPC Codes and Tasks

With easier access to powerful compute resources, there is a growing trend in AI for software development to develop large language models (LLMs) to address a variety of programming tasks. Even LLMs applied to tasks from the…

Programming Languages · Computer Science 2024-09-23 Tal Kadosh , Niranjan Hasabnis , Vy A. Vo , Nadav Schneider , Neva Krien , Mihai Capota , Abdul Wasay , Nesreen Ahmed , Ted Willke , Guy Tamir , Yuval Pinter , Timothy Mattson , Gal Oren

CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering

Large Language Models (LLMs) are pretrained on extensive multilingual corpora to acquire both language-specific cultural knowledge and general knowledge. Ideally, while LLMs should provide consistent responses to culture-independent…

Computation and Language · Computer Science 2025-02-11 Yumeng Wang , Zhiyuan Fan , Qingyun Wang , May Fung , Heng Ji