English
Related papers

Related papers: Contextualized Data-Wrangling Code Generation in C…

200 papers

Effective code documentation is essential for collaboration, comprehension, and long-term software maintainability, yet developers often neglect it due to its repetitive nature. Automated documentation generation has evolved from heuristic…

Software Engineering · Computer Science 2026-02-10 Mojtaba Mostafavi Ghahfarokhi , Hamed Jahantigh , Alireza Asadi , Abbas Heydarnoori

Ensuring data quality in large tabular datasets is a critical challenge, typically addressed through data wrangling tasks. Traditional statistical methods, though efficient, cannot often understand the semantic context and deep learning…

Machine Learning · Computer Science 2025-02-25 Ashlesha Akella , Krishnasuri Narayanam

Jupyter notebook allows data scientists to write machine learning code together with its documentation in cells. In this paper, we propose a new task of code documentation generation (CDG) for computational notebooks. In contrast to the…

Software Engineering · Computer Science 2021-09-10 Xuye Liu , Dakuo Wang , April Wang , Yufang Hou , Lingfei Wu

CoWrangler is a data-wrangling recommender system designed to streamline data processing tasks. Recognizing that data processing is often time-consuming and complex for novice users, we aim to simplify the decision-making process regarding…

Databases · Computer Science 2024-09-18 Yuqing Wang , Anna Fariha

Computational notebooks, such as Jupyter notebooks, are interactive computing environments that are ubiquitous among data scientists to perform data wrangling and analytic tasks. To measure the performance of AI pair programmers that…

Automatic generation of high-quality commit messages for code commits can substantially facilitate software developers' works and coordination. However, the semantic gap between source code and natural language poses a major challenge for…

Computation and Language · Computer Science 2021-06-22 Lun Yiu Nie , Cuiyun Gao , Zhicong Zhong , Wai Lam , Yang Liu , Zenglin Xu

Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation.…

The motivation of the current study was to design an algorithm that can speed up the processing of a query. The important feature is generating code dynamically for a specific query. We present the technique of code generation that is…

Databases · Computer Science 2017-12-12 Xin Zhang

As code generation becomes increasingly central to improving software development efficiency, modern code models are largely trained and evaluated on code with natural-language descriptions. In real projects, developers often implement…

Software Engineering · Computer Science 2026-05-19 Chen Liu , Qingyuan Liang , Hanwen Zhang , Zeyu Sun , Yakun Zhang , Lu Zhang

Sensemaking is the iterative process of identifying, extracting, and explaining insights from data, where each iteration is referred to as the "sensemaking loop." Although recent work observes snapshots of the sensemaking loop within…

Human-Computer Interaction · Computer Science 2022-09-09 Deepthi Raghunandan , Aayushi Roy , Shenzhi Shi , Niklas Elmqvist , Leilani Battle

As code completion task from function-level to repository-level, leveraging contextual information from large-scale codebases becomes a core challenge. However, existing retrieval-augmented generation (RAG) methods typically treat code as…

Software Engineering · Computer Science 2025-12-05 Xinkui Zhao , Rongkai Liu , Yifan Zhang , Chen Zhi , Lufei Zhang , Guanjie Cheng , Yueshen Xu , Shuiguang Deng , Jianwei Yin

While pre-trained language models (LM) for code have achieved great success in code completion, they generate code conditioned only on the contents within the file, i.e., in-file context, but ignore the rich semantics in other files within…

Computation and Language · Computer Science 2023-05-25 Yangruibo Ding , Zijian Wang , Wasi Uddin Ahmad , Murali Krishna Ramanathan , Ramesh Nallapati , Parminder Bhatia , Dan Roth , Bing Xiang

Source code summaries are short natural language descriptions of code snippets that help developers better understand and maintain source code. There has been a surge of work on automatic code summarization to reduce the burden of writing…

Software Engineering · Computer Science 2021-07-06 Yanlin Wang , Ensheng Shi , Lun Du , Xiaodi Yang , Yuxuan Hu , Shi Han , Hongyu Zhang , Dongmei Zhang

Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities; code-tuned LMs have proven adept at generating programs that solve a wide variety of algorithmic symbolic manipulation tasks (e.g. word…

Computation and Language · Computer Science 2024-11-05 Nathaniel Weir , Muhammad Khalifa , Linlu Qiu , Orion Weller , Peter Clark

Interactive programming with interleaved code snippet cells and natural language markdown is recently gaining popularity in the form of Jupyter notebooks, which accelerate prototyping and collaboration. To study code generation conditioned…

Machine Learning · Computer Science 2019-10-10 Rajas Agashe , Srinivasan Iyer , Luke Zettlemoyer

Large Language Models (LLMs) have shown remarkable progress in automated code generation. Yet, LLM-generated code may contain errors in API usage, class, data structure, or missing project-specific information. As much of this…

Computation and Language · Computer Science 2024-06-12 Zhangqian Bi , Yao Wan , Zheng Wang , Hongyu Zhang , Batu Guan , Fangxin Lu , Zili Zhang , Yulei Sui , Hai Jin , Xuanhua Shi

One of the central tasks in software maintenance is being able to understand and develop code changes. Thus, given a natural language description of the desired new operation of a function, an agent (human or AI) might be asked to generate…

Software Engineering · Computer Science 2025-02-05 Kunal Pai , Premkumar Devanbu , Toufique Ahmed

Despite recent successes of large pre-trained language models in solving reasoning tasks, their inference capabilities remain opaque. We posit that such models can be made more interpretable by explicitly generating interim inference rules,…

Computation and Language · Computer Science 2021-06-07 Debjit Paul , Anette Frank

We introduce KodCode, a synthetic dataset that addresses the persistent challenge of acquiring high-quality, verifiable training data across diverse difficulties and domains for training Large Language Models for coding. Existing…

Machine Learning · Computer Science 2025-07-15 Zhangchen Xu , Yang Liu , Yueqin Yin , Mingyuan Zhou , Radha Poovendran

Traditional end-to-end contextual robust optimization models are trained for specific contextual data, requiring complete retraining whenever new contextual information arrives. This limitation hampers their use in online decision-making…

Optimization and Control · Mathematics 2025-10-20 Carlos Gamboa , Alexandre Street , Davi Valladão , Bernardo Pagnocelli
‹ Prev 1 2 3 10 Next ›