Related papers: Mining Idioms from Source Code

Mining Idioms in the Wild

Existing code repositories contain numerous instances of code patterns that are idiomatic ways of accomplishing a particular programming task. Sometimes, the programming language in use supports specific operators or APIs that can express…

Software Engineering · Computer Science 2021-07-15 Aishwarya Sivaraman , Rui Abreu , Andrew Scott , Tobi Akomolede , Satish Chandra

Program Synthesis and Semantic Parsing with Learned Code Idioms

Program synthesis of general-purpose source code from natural language specifications is challenging due to the need to reason about high-level patterns in the target program and low-level implementation details at the same time. In this…

Machine Learning · Computer Science 2019-11-06 Richard Shin , Miltiadis Allamanis , Marc Brockschmidt , Oleksandr Polozov

Learning Programmatic Idioms for Scalable Semantic Parsing

Programmers typically organize executable source code using high-level coding patterns or idiomatic structures such as nested loops, exception handlers and recursive blocks, rather than as individual code tokens. In contrast, state of the…

Computation and Language · Computer Science 2019-09-09 Srinivasan Iyer , Alvin Cheung , Luke Zettlemoyer

Teddy: Automatic Recommendation of Pythonic Idiom Usage For Pull-Based Software Projects

Pythonic code is idiomatic code that follows guiding principles and practices within the Python community. Offering performance and readability benefits, Pythonic code is claimed to be widely adopted by experienced Python developers, but…

Software Engineering · Computer Science 2020-09-09 Purit Phan-udom , Naruedon Wattanakul , Tattiya Sakulniwat , Chaiyong Ragkhitwetsagul , Thanwadee Sunetnanta , Morakot Choetkiertikul , Raula Gaikovina Kula

Memorization or Reasoning? Exploring the Idiom Understanding of LLMs

Idioms have long posed a challenge due to their unique linguistic properties, which set them apart from other common expressions. While recent studies have leveraged large language models (LLMs) to handle idioms across various tasks, e.g.,…

Computation and Language · Computer Science 2025-09-24 Jisu Kim , Youngwoo Shin , Uiji Hwang , Jihun Choi , Richeng Xuan , Taeuk Kim

A Survey of Idiom Datasets for Psycholinguistic and Computational Research

Idioms are figurative expressions whose meanings often cannot be inferred from their individual words, making them difficult to process computationally and posing challenges for human experimental studies. This survey reviews datasets…

Computation and Language · Computer Science 2025-08-19 Michael Flor , Xinyi Liu , Anna Feldman

Making Python Code Idiomatic by Automatic Refactoring Non-Idiomatic Python Code with Pythonic Idioms

Compared to other programming languages (e.g., Java), Python has more idioms to make Python code concise and efficient. Although pythonic idioms are well accepted in the Python community, Python programmers are often faced with many…

Software Engineering · Computer Science 2022-07-13 Zejun Zhang , Zhenchang Xing , Xin Xia , Xiwei Xu , Liming Zhu

Polymorphic Type Inference for Machine Code

For many compiled languages, source-level types are erased very early in the compilation process. As a result, further compiler passes may convert type-safe source into type-unsafe machine code. Type-unsafe idioms in the original source and…

Programming Languages · Computer Science 2016-03-22 Matthew Noonan , Alexey Loginov , David Cok

EditSum: A Retrieve-and-Edit Framework for Source Code Summarization

Existing studies show that code summaries help developers understand and maintain source code. Unfortunately, these summaries are often missing or outdated in software projects. Code summarization aims to generate natural language…

Software Engineering · Computer Science 2023-09-08 Jia Li , Yongmin Li , Ge Li , Xing Hu , Xin Xia , Zhi Jin

Learning to superoptimize programs

Code super-optimization is the task of transforming any given program to a more efficient version while preserving its input-output behaviour. In some sense, it is similar to the paraphrase problem from natural language processing where the…

Machine Learning · Computer Science 2017-06-29 Rudy Bunel , Alban Desmaison , M. Pawan Kumar , Philip H. S. Torr , Pushmeet Kohli

Exempla Gratis (E.G.): Code Examples for Free

Modern software engineering often involves using many existing APIs, both open source and, in industrial coding environments, proprietary. Programmers reference documentation and code search tools to remind themselves of proper common usage…

Software Engineering · Computer Science 2020-11-04 Celeste Barnaby , Koushik Sen , Tianyi Zhang , Elena Glassman , Satish Chandra

Inferring Input Grammars from Dynamic Control Flow

A program is characterized by its input model, and a formal input model can be of use in diverse areas including vulnerability analysis, reverse engineering, fuzzing and software testing, clone detection and refactoring. Unfortunately,…

Software Engineering · Computer Science 2019-12-13 Rahul Gopinath , Björn Mathis , Andreas Zeller

Semantic Source Code Models Using Identifier Embeddings

The emergence of online open source repositories in the recent years has led to an explosion in the volume of openly available source code, coupled with metadata that relate to a variety of software development activities. As an effect, in…

Software Engineering · Computer Science 2023-12-05 Vasiliki Efstathiou , Diomidis Spinellis

Natural Language-Guided Programming

In today's software world with its cornucopia of reusable software libraries, when a programmer is faced with a programming task that they suspect can be completed through the use of a library, they often look for code examples using a…

Software Engineering · Computer Science 2021-10-08 Geert Heyman , Rafael Huysegems , Pascal Justen , Tom Van Cutsem

Jigsaw: Large Language Models meet Program Synthesis

Large pre-trained language models such as GPT-3, Codex, and Google's language model are now capable of generating code from natural language specifications of programmer intent. We view these developments with a mixture of optimism and…

Software Engineering · Computer Science 2021-12-07 Naman Jain , Skanda Vaidyanath , Arun Iyer , Nagarajan Natarajan , Suresh Parthasarathy , Sriram Rajamani , Rahul Sharma

In-IDE Code Generation from Natural Language: Promise and Challenges

A great part of software development involves conceptualizing or communicating the underlying procedures and logic that needs to be expressed in programs. One major difficulty of programming is turning concept into code, especially when…

Software Engineering · Computer Science 2021-09-23 Frank F. Xu , Bogdan Vasilescu , Graham Neubig

Latent Idiom Recognition for a Minimalist Functional Array Language using Equality Saturation

Accelerating programs is typically done by recognizing code idioms matching high-performance libraries or hardware interfaces. However, recognizing such idioms automatically is challenging. The idiom recognition machinery is difficult to…

Programming Languages · Computer Science 2024-01-01 Jonathan Van der Cruysse , Christophe Dubach

Heterogeneous Metric Learning with Content-based Regularization for Software Artifact Retrieval

The problem of software artifact retrieval has the goal to effectively locate software artifacts, such as a piece of source code, in a large code repository. This problem has been traditionally addressed through the textual query. In other…

Machine Learning · Computer Science 2016-11-15 Liang Wu , Hui Xiong , Liang Du , Bo Liu , Guandong Xu , Yong Ge , Yanjie Fu , Yuanchun Zhou , Jianhui Li

Vector Representations of Idioms in Conversational Systems

We demonstrate, in this study, that an open-domain conversational system trained on idioms or figurative language generates more fitting responses to prompts containing idioms. Idioms are part of everyday speech in many languages, across…

Computation and Language · Computer Science 2022-05-10 Tosin Adewumi , Foteini Liwicki , Marcus Liwicki

Logical Segmentation of Source Code

Many software analysis methods have come to rely on machine learning approaches. Code segmentation - the process of decomposing source code into meaningful blocks - can augment these methods by featurizing code, reducing noise, and limiting…

Software Engineering · Computer Science 2019-07-23 Jacob Dormuth , Ben Gelman , Jessica Moore , David Slater