Related papers: Studying the Difference Between Natural and Progra…

Do People Prefer "Natural" code?

Natural code is known to be very repetitive (much more so than natural language corpora); furthermore, this repetitiveness persists, even after accounting for the simpler syntax of code. However, programming languages are very expressive,…

Computation and Language · Computer Science 2019-10-10 Casey Casalnuovo , Kevin Lee , Hulin Wang , Prem Devanbu , Emily Morgan

Evaluating Code Readability and Legibility: An Examination of Human-centric Studies

Reading code is an essential activity in software maintenance and evolution. Several studies with human subjects have investigated how different factors, such as the employed programming constructs and naming conventions, can impact code…

Software Engineering · Computer Science 2021-10-05 Delano Oliveira , Reydne Bruno , Fernanda Madeiral , Fernando Castor

A Survey of Machine Learning for Big Code and Naturalness

Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit code's abundance of patterns. In…

Software Engineering · Computer Science 2018-05-08 Miltiadis Allamanis , Earl T. Barr , Premkumar Devanbu , Charles Sutton

Source codes in human communication

Although information theoretic characterizations of human communication have become increasingly popular in linguistics, to date they have largely involved grafting probabilistic constructs onto older ideas about grammar. Similarities…

Computation and Language · Computer Science 2019-04-09 Michael Ramscar

A Survey on Natural Language Processing for Programming

Natural language processing for programming aims to use NLP techniques to assist programming. It is increasingly prevalent for its effectiveness in improving productivity. Distinct from natural language, a programming language is highly…

Computation and Language · Computer Science 2023-08-08 Qingfu Zhu , Xianzhen Luo , Fang Liu , Cuiyun Gao , Wanxiang Che

Code-switching in text and speech challenges information-theoretic speaker design

In this work, we use language modeling to investigate the factors that influence insertional code-switching. Code-switching occurs when a speaker alternates between one language variety (the primary language) and another (the secondary…

Computation and Language · Computer Science 2026-05-05 Debasmita Bhattacharya , Marten van Schijndel

Understanding the Properties of Generated Corpora

Models for text generation have become focal for many research tasks and especially for the generation of sentence corpora. However, understanding the properties of an automatically generated text corpus remains challenging. We propose a…

Computation and Language · Computer Science 2022-10-28 Naama Zwerdling , Segev Shlomov , Esther Goldbraich , George Kour , Boaz Carmeli , Naama Tepper , Inbal Ronen , Vitaly Zabershinsky , Ateret Anaby-Tavor

What Makes Code Hard to Understand?

What factors impact the comprehensibility of code? Previous research suggests that expectation-congruent programs should take less time to understand and be less prone to errors. We present an experiment in which participants with…

Software Engineering · Computer Science 2013-04-29 Michael Hansen , Robert L. Goldstone , Andrew Lumsdaine

Exploring the Curious Case of Code Prompts

Recent work has shown that prompting language models with code-like representations of natural language leads to performance improvements on structured reasoning tasks. However, such tasks comprise only a small subset of all natural…

Computation and Language · Computer Science 2023-04-27 Li Zhang , Liam Dugan , Hainiu Xu , Chris Callison-Burch

On Code-Induced Reasoning in LLMs

Code data has been shown to enhance the reasoning capabilities of large language models (LLMs), but it remains unclear which aspects of code are most responsible. We investigate this question with a systematic, data-centric framework. We…

Computation and Language · Computer Science 2025-10-03 Abdul Waheed , Zhen Wu , Carolyn Rosé , Daphne Ippolito

Investigating the Impact of Vocabulary Difficulty and Code Naturalness on Program Comprehension

Context: Developers spend most of their time comprehending source code during software development. Automatically assessing how readable and understandable source code is can provide various benefits in different tasks, such as task…

Software Engineering · Computer Science 2023-08-28 Bin Lin , Gregorio Robles

Subjective Code Preferences in Experts and Large Language Models

Large Language Models (LLMs) have become increasingly popular for coding tasks, with subjective coding preferences being an essential element to adapt to programmers' personal needs. Existing work overlooks such characteristics and mainly…

Human-Computer Interaction · Computer Science 2026-05-26 Anna Mokhova , Subhabrata Dutta , Iryna Gurevych , Simone Balloccu

Communicating Natural Programs to Humans and Machines

The Abstraction and Reasoning Corpus (ARC) is a set of procedural tasks that tests an agent's ability to flexibly solve novel problems. While most ARC tasks are easy for humans, they are challenging for state-of-the-art AI. What makes…

Artificial Intelligence · Computer Science 2023-05-23 Samuel Acquaviva , Yewen Pu , Marta Kryven , Theodoros Sechopoulos , Catherine Wong , Gabrielle E Ecanow , Maxwell Nye , Michael Henry Tessler , Joshua B. Tenenbaum

A Comparative Study of Programming Languages in Rosetta Code

Sometimes debates on programming languages are more religious than scientific. Questions about which language is more succinct or efficient, or makes developers more productive are discussed with fervor, and their answers are too often…

Software Engineering · Computer Science 2015-06-04 Sebastian Nanz , Carlo A. Furia

On the Importance and Shortcomings of Code Readability Metrics: A Case Study on Reactive Programming

Well structured and readable source code is a pre-requisite for maintainable software and successful collaboration among developers. Static analysis enables the automated extraction of code complexity and readability metrics which can be…

Software Engineering · Computer Science 2021-10-29 Gustaf Holst , Felix Dobslaw

Bringing Structure to Naturalness: On the Naturalness of ASTs

Source code comes in different shapes and forms. Previous research has already shown code to be more predictable than natural language as well as highlighted its statistical predictability at the token level: source code can be natural.…

Software Engineering · Computer Science 2025-04-14 Profir-Petru Pârţachi , Mahito Sugiyama

Teaching natural language to computers

"Natural Language," whether spoken and attended to by humans, or processed and generated by computers, requires networked structures that reflect creative processes in semantic, syntactic, phonetic, linguistic, social, emotional, and…

Computation and Language · Computer Science 2016-06-29 Joseph Corneli , Miriam Corneli

Not All Code Is Equal: A Data-Centric Study of Code Complexity and LLM Reasoning

Large Language Models (LLMs) increasingly exhibit strong reasoning abilities, often attributed to their capacity to generate chain-of-thought-style intermediate reasoning. Recent work suggests that exposure to code can further enhance these…

Machine Learning · Computer Science 2026-01-30 Lukas Twist , Shu Yang , Hanqi Yan , Jingzhi Gong , Di Wang , Helen Yannakoudakis , Jie M. Zhang

The Natural Stories Corpus

It is now a common practice to compare models of human language processing by predicting participant reactions (such as reading times) to corpora consisting of rich naturalistic linguistic materials. However, many of the corpora used in…

Computation and Language · Computer Science 2017-08-22 Richard Futrell , Edward Gibson , Hal Tily , Idan Blank , Anastasia Vishnevetsky , Steven T. Piantadosi , Evelina Fedorenko

It's Not Easy Being Green: On the Energy Efficiency of Programming Languages

Does the choice of programming language affect energy consumption? Previous highly visible studies have established associations between certain programming languages and energy consumption. A causal misinterpretation of this work has led…

Programming Languages · Computer Science 2025-10-06 Nicolas van Kempen , Hyuk-Je Kwon , Dung Tuan Nguyen , Emery D. Berger