English
Related papers

Related papers: CoDesc: A Large Code-Description Parallel Dataset

200 papers

Source Code Summarization is the task of writing short, natural language descriptions of source code. The main use for these descriptions is in software documentation e.g. the one-sentence Java method descriptions in JavaDocs. Code…

Computation and Language · Computer Science 2019-04-05 Alexander LeClair , Collin McMillan

Translating source code from one programming language to another is a critical, time-consuming task in modernizing legacy applications and codebases. Recent work in this space has drawn inspiration from the software naturalness hypothesis…

Code search is a task to find programming codes that semantically match the given natural language queries. Even though some of the existing datasets for this task are multilingual on the programming language side, their query data are only…

Computation and Language · Computer Science 2023-06-28 Ryo Sekizawa , Nan Duan , Shuai Lu , Hitomi Yanaka

Reimplementing solutions to previously solved software engineering problems is not only inefficient but also introduces inadequate and error-prone code. Many existing methods achieve impressive performance on this issue by using…

Software Engineering · Computer Science 2022-10-04 Usama Nadeem , Noah Ziems , Shaoen Wu

The performance of automatic code documentation generation models depends critically on the quality of the training data used for supervision. However, most existing code documentation datasets are constructed through large scale scraping…

Software Engineering · Computer Science 2025-12-25 Recep Kaan Karaman , Meftun Akarsu

Semantic code search is the task of retrieving relevant code given a natural language query. While related to other information retrieval tasks, it requires bridging the gap between the language used in code (often abbreviated and highly…

Machine Learning · Computer Science 2020-06-09 Hamel Husain , Ho-Hsiang Wu , Tiferet Gazit , Miltiadis Allamanis , Marc Brockschmidt

Over the last several decades, software has been woven into the fabric of every aspect of our society. As software development surges and code infrastructure of enterprise applications ages, it is now more critical than ever to increase…

There has been an increase of interest in code search using natural language. Assessing the performance of such code search models can be difficult without a readily available evaluation suite. In this paper, we present an evaluation…

Software Engineering · Computer Science 2019-10-03 Hongyu Li , Seohyun Kim , Satish Chandra

During software maintenance, programmers spend a lot of time on code comprehension. Reading comments is an effective way for programmers to reduce the reading and navigating time when comprehending source code. Therefore, as a critical task…

Software Engineering · Computer Science 2018-02-01 Xing Hu , Yuhan Wei , Ge Li , Zhi Jin

The performance of neural code search is significantly influenced by the quality of the training data from which the neural models are derived. A large corpus of high-quality query and code pairs is demanded to establish a precise mapping…

Software Engineering · Computer Science 2022-02-15 Zhensu Sun , Li Li , Yan Liu , Xiaoning Du , Li Li

Recent advances in machine learning have significantly improved the understanding of source code data and achieved good performance on a number of downstream tasks. Open source repositories like GitHub enable this process with rich…

Software Engineering · Computer Science 2022-06-20 Ming Zhu , Aneesh Jain , Karthik Suresh , Roshan Ravindran , Sindhu Tipirneni , Chandan K. Reddy

While there has been a recent burgeoning of applications at the intersection of natural and programming languages, such as code generation and code summarization, these applications are usually English-centric. This creates a barrier for…

Computation and Language · Computer Science 2023-02-08 Zhiruo Wang , Grace Cuenca , Shuyan Zhou , Frank F. Xu , Graham Neubig

Duplicated code has a negative impact on the quality of software systems and should be detected at least. In this paper, we discuss an approach that improves source code retrieval using the structural information about the programs. We…

Software Engineering · Computer Science 2013-08-19 Yoshihisa Udagawa

Finding codes given natural language query isb eneficial to the productivity of software developers. Future progress towards better semantic matching between query and code requires richer supervised training resources. To remedy this, we…

Computation and Language · Computer Science 2021-05-28 Junjie Huang , Duyu Tang , Linjun Shou , Ming Gong , Ke Xu , Daxin Jiang , Ming Zhou , Nan Duan

As code search permeates most activities in software development,code-to-code search has emerged to support using code as a query and retrieving similar code in the search results. Applications include duplicate code detection for…

Software Engineering · Computer Science 2021-06-18 George Mathew , Kathryn T. Stolee

Language models can serve as a valuable tool for software developers to increase productivity. Large generative models can be used for code generation and code completion, while smaller encoder-only models are capable of performing code…

Computation and Language · Computer Science 2023-11-17 Andor Diera , Abdelhalim Dahou , Lukas Galke , Fabian Karl , Florian Sihler , Ansgar Scherp

Neural program embedding can be helpful in analyzing large software, a task that is challenging for traditional logic-based program analyses due to their limited scalability. A key focus of recent machine-learning advances in this area is…

Machine Learning · Computer Science 2019-05-29 Ke Wang , Mihai Christodorescu

Code large language models mark a pivotal breakthrough in artificial intelligence. They are specifically crafted to understand and generate programming languages, significantly boosting the efficiency of coding development workflows. In…

Software Engineering · Computer Science 2024-03-26 Rui Xie , Zhengran Zeng , Zhuohao Yu , Chang Gao , Shikun Zhang , Wei Ye

Large pre-trained language models have been used to generate code,providing a flexible interface for synthesizing programs from natural language specifications. However, they often violate syntactic and semantic rules of their output…

Machine Learning · Computer Science 2022-01-28 Gabriel Poesia , Oleksandr Polozov , Vu Le , Ashish Tiwari , Gustavo Soares , Christopher Meek , Sumit Gulwani

Recent research has achieved impressive results on understanding and improving source code by building up on machine-learning techniques developed for natural languages. A significant advancement in natural-language understanding has come…

Software Engineering · Computer Science 2020-08-19 Aditya Kanade , Petros Maniatis , Gogul Balakrishnan , Kensen Shi
‹ Prev 1 2 3 10 Next ›