English
Related papers

Related papers: Program Translation via Code Distillation

200 papers

In this paper, we leverage low-level compiler intermediate representations (IR) to improve code translation. Traditional transpilers rely on syntactic information and handcrafted rules, which limits their applicability and produces…

Programming Languages · Computer Science 2023-04-25 Marc Szafraniec , Baptiste Roziere , Hugh Leather , Francois Charton , Patrick Labatut , Gabriel Synnaeve

Translating source code from one programming language to another is a critical, time-consuming task in modernizing legacy applications and codebases. Recent work in this space has drawn inspiration from the software naturalness hypothesis…

With little to no parallel data available for programming languages, unsupervised methods are well-suited to source code translation. However, the majority of unsupervised machine translation approaches rely on back-translation, a method…

Software Engineering · Computer Science 2022-02-17 Baptiste Roziere , Jie M. Zhang , Francois Charton , Mark Harman , Gabriel Synnaeve , Guillaume Lample

While many parallel corpora are not publicly accessible for data copyright, data privacy and competitive differentiation reasons, trained translation models are increasingly available on open platforms. In this work, we propose a method…

Computation and Language · Computer Science 2023-06-13 Yuanchi Zhang , Peng Li , Maosong Sun , Yang Liu

Large language models (LLMs), despite their ability to perform few-shot machine translation (MT), often lag behind dedicated MT systems trained on parallel corpora, which are crucial for high quality machine translation (MT). However,…

Computation and Language · Computer Science 2025-08-12 Deepon Halder , Thanmay Jayakumar , Raj Dabre

Large-scale self-supervised pre-trained speech encoders outperform conventional approaches in speech recognition and translation tasks. Due to the high cost of developing these large models, building new encoders for new tasks and deploying…

Computation and Language · Computer Science 2023-12-29 Heng-Jui Chang , Ning Dong , Ruslan Mavlyutov , Sravya Popuri , Yu-An Chung

A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port…

Computation and Language · Computer Science 2020-09-23 Marie-Anne Lachaux , Baptiste Roziere , Lowik Chanussot , Guillaume Lample

Text Style Transfer (TST) seeks to alter the style of text while retaining its core content. Given the constraints of limited parallel datasets for TST, we propose CoTeX, a framework that leverages large language models (LLMs) alongside…

Computation and Language · Computer Science 2024-05-07 Chiyu Zhang , Honglong Cai , Yuezhang , Li , Yuexin Wu , Le Hou , Muhammad Abdul-Mageed

Codistillation has been proposed as a mechanism to share knowledge among concurrently trained models by encouraging them to represent the same function through an auxiliary loss. This contrasts with the more commonly used fully-synchronous…

Machine Learning · Computer Science 2021-07-27 Shagun Sodhani , Olivier Delalleau , Mahmoud Assran , Koustuv Sinha , Nicolas Ballas , Michael Rabbat

Recently, the automated translation of source code from one programming language to another by using automatic approaches inspired by Neural Machine Translation (NMT) methods for natural languages has come under study. However, such…

Existing language model compression methods mostly use a simple L2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one. Although widely used, this objective by design assumes that all the…

Computation and Language · Computer Science 2020-09-30 Siqi Sun , Zhe Gan , Yu Cheng , Yuwei Fang , Shuohang Wang , Jingjing Liu

Pre-trained transformers have recently clinched top spots in the gamut of natural language tasks and pioneered solutions to software engineering tasks. Even information retrieval has not been immune to the charm of the transformer, though…

Information Retrieval · Computer Science 2021-08-10 Colin B. Clement , Chen Wu , Dawn Drain , Neel Sundaresan

Code translation aims to convert source code from one programming language (PL) to another. Given the promising abilities of large language models (LLMs) in code synthesis, researchers are exploring their potential to automate code…

Lack of specialized data makes building a multi-domain neural machine translation tool challenging. Although emerging literature dealing with low resource languages starts to show promising results, most state-of-the-art models used…

Computation and Language · Computer Science 2020-04-17 Idriss Mghabbar , Pirashanth Ratnamogan

The goal of model distillation is to faithfully transfer teacher model knowledge to a model which is faster, more generalizable, more interpretable, or possesses other desirable characteristics. Human-readability is an important and…

Prevailing Dataset Distillation (DD) methods leveraging generative models confront two fundamental limitations. First, despite pioneering the use of diffusion models in DD and delivering impressive performance, the vast majority of…

Computer Vision and Pattern Recognition · Computer Science 2025-12-04 Letian Zhou , Songhua Liu , Xinchao Wang

One major challenge of translating code between programming languages is that parallel training data is often limited. To overcome this challenge, we present two data augmentation techniques, one that builds comparable corpora (i.e., code…

Computation and Language · Computer Science 2024-10-07 Yiqing Xie , Atharva Naik , Daniel Fried , Carolyn Rose

Code translation aims to transform code between programming languages while preserving functionality, with applications in cross-platform development and software migration. Recent advances in Large Language Models (LLMs) have improved code…

Software Engineering · Computer Science 2025-04-04 Li Xin-Ye , Du Ya-Li , Li Ming

Multimodal dataset distillation aims to synthesize a small set of image-text pairs that enables efficient training of large-scale vision-language models. While dataset distillation has shown promise in unimodal tasks, extending it to…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Yongmin Lee , Hye Won Chung

Translation is important for cross-language communication, and many efforts have been made to improve its accuracy. However, less investment is conducted in aligning translations with human preferences, such as translation tones or styles.…

Computation and Language · Computer Science 2024-10-16 Shuqiao Sun , Yutong Yao , Peiwen Wu , Feijun Jiang , Kaifu Zhang
‹ Prev 1 2 3 10 Next ›