Related papers: A parallel corpus of Python functions and document…
Documentation debt hinders the effective utilization of open-source software. Although code summarization tools have been helpful for developers, most would prefer a detailed account of each parameter in a function rather than a high-level…
Publicly available source-code libraries are continuously growing and changing. This makes it impossible for models of code to keep current with all available APIs by simply training these models on existing code repositories. Thus,…
Translating source code from one programming language to another is a critical, time-consuming task in modernizing legacy applications and codebases. Recent work in this space has drawn inspiration from the software naturalness hypothesis…
We consider the problem of parsing natural language descriptions into source code written in a general-purpose programming language like Python. Existing data-driven methods treat this problem as a language generation task without…
Semantic code search is the task of retrieving relevant code given a natural language query. While related to other information retrieval tasks, it requires bridging the gap between the language used in code (often abbreviated and highly…
Software documentation is an essential but labor intensive task that often requires a dedicated team of developers to ensure coverage and accuracy. Good documentation will help shorten the development cycle and improve the overall team…
Neural Machine Translation with its significant results, still has a great problem: lack or absence of parallel corpus for many languages. This article suggests a method for generating considerable amount of parallel corpus for any language…
Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Since both tasks aim to model the association between natural…
Parallel texts are a relatively rare language resource, however, they constitute a very useful research material with a wide range of applications. This study presents and analyses new methodologies we developed for obtaining such data from…
Software engineering research has always being concerned with the improvement of code completion approaches, which suggest the next tokens a developer will likely type while coding. The release of GitHub Copilot constitutes a big step…
Simultaneously modeling source code and natural language has many exciting applications in automated software development and understanding. Pursuant to achieving such technology, we introduce PyMT5, the Python method text-to-text transfer…
In today's software world with its cornucopia of reusable software libraries, when a programmer is faced with a programming task that they suspect can be completed through the use of a library, they often look for code examples using a…
Developing models that can automatically generate detailed code explanation can greatly benefit software maintenance and programming education. However, existing code-to-text generation models often produce only high-level summaries of code…
Automated documentation of programming source code is a challenging task with significant practical and scientific implications for the developer community. We present a large language model (LLM)-based application that developers can use…
One major challenge of translating code between programming languages is that parallel training data is often limited. To overcome this challenge, we present two data augmentation techniques, one that builds comparable corpora (i.e., code…
Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance.…
The purpose of this paper is to show how existing scientific software can be parallelized using a separate thin layer of Python code where all parallel communication is implemented. We provide specific examples on such layers of code, and…
Recent advancements in natural language processing \cite{gpt2} \cite{BERT} have led to near-human performance in multiple natural language tasks. In this paper, we seek to understand whether similar techniques can be applied to a highly…
Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task,…
We consider the problem of translating high-level textual descriptions to formal representations in technical documentation as part of an effort to model the meaning of such documentation. We focus specifically on the problem of learning…