Related papers: VDebugger: Harnessing Execution Feedback for Debug…
Debugging is an essential part of software maintenance and evolution since it allows software developers to analyze program execution step by step. Understanding a program is required to fix potential flaws, alleviate bottlenecks, and…
Visual programming, a modular and generalizable paradigm, integrates different modules and Python operators to solve various vision-language tasks. Unlike end-to-end models that need task-specific data, it advances in performing visual…
Programming based approaches to reasoning tasks have substantially expanded the types of questions models can answer about visual scenes. Yet on benchmark visual reasoning data, when models answer correctly, they produce incorrect programs…
Predicting program behavior and reasoning about code execution remain significant challenges in software engineering, particularly for large language models (LLMs) designed for code analysis. While these models excel at understanding static…
Large language models (LLMs) are leading significant progress in code generation. Beyond one-pass code generation, recent works further integrate unit tests and program verifiers into LLMs to iteratively refine the generated programs.…
In recent years, more vulnerabilities have been discovered every day, while manual vulnerability repair requires specialized knowledge and is time-consuming. As a result, many detected or even published vulnerabilities remain unpatched,…
Recent advances in Large Language Model (LLM)-based agents have shown remarkable progress in code generation. However, current agent methods mainly rely on text-output-based feedback (e.g. command-line outputs) for multi-round debugging and…
For many years, developers could not figure out the mystery of OS kernels. The main source of this mystery is the interaction between operating systems and hardware while system's boot up and kernel initialization. In addition, many…
Training large language models (LLMs) on Python execution traces grounds them in code execution and enables the line-by-line execution prediction of whole Python programs, effectively turning them into neural interpreters (FAIR CodeGen Team…
The practical impact of deep learning on complex supervised learning problems has been significant, so much so that almost every Artificial Intelligence problem, or at least a portion thereof, has been somehow recast as a deep learning…
In this position paper, we present a prototype of a visualizer for functional programs. Such programs, whose evaluation model is the reduction of an expression to a value through repeated application of rewriting rules, and which tend to…
Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes. As a result, these black-box models often learn to exploit biases…
Answering visual queries is a complex task that requires both visual processing and reasoning. End-to-end models, the dominant approach for this task, do not explicitly differentiate between the two, limiting interpretability and…
Block-based programming environments such as Scratch are increasingly popular in programming education, in particular for young learners. While the use of blocks helps prevent syntax errors, semantic bugs remain common and difficult to…
Recent advances in program synthesis offer means to automatically debug student submissions and generate personalized feedback in massive programming classrooms. When automatically generating feedback for programming assignments, a key…
Multimodal Large Language Models (MLLMs) struggle with precise reasoning for structured visuals like charts and diagrams, as pixel-based perception lacks a mechanism for verification. To address this, we propose to leverage derendering --…
Visualization authoring is an iterative process requiring users to adjust parameters to achieve desired aesthetics. Due to its complexity, users often create defective visualizations and struggle to fix them. Many seek help on forums (e.g.,…
Deep learning model design, development, and debugging is a process driven by best practices, guidelines, trial-and-error, and the personal experiences of model developers. At multiple stages of this process, performance and internal model…
Debugging formal verification (FV) failures represents one of the most time-consuming bottlenecks in modern hardware design workflows. When properties fail, engineers must manually trace through complex counter-examples spanning multiple…
Debugging CUDA programs has long been challenging because failures often arise from subtle interactions among hardware behavior, compiler decisions, memory hierarchy, and asynchronous execution. More importantly, with the rapid expansion of…