Related papers: Creating a Trajectory for Code Writing: Algorithmi…
Large language models (LLMs) can perform complex reasoning in few- and zero-shot settings by generating intermediate chain of thought (CoT) reasoning steps. Further, each reasoning step can rely on external tools to support computation…
Introductory programming courses often emphasize mastering syntax and basic constructs before progressing to more complex and interesting programs. This bottom-up approach can be frustrating for novices, shifting the focus away from problem…
Analytical reasoning is an essential and challenging task that requires a system to analyze a scenario involving a set of particular circumstances and perform reasoning over it to make conclusions. In this paper, we study the challenge of…
This study examines how AI code assistants shape novice programmers experiences during a two-part exam in an introductory programming course. In the first part, students completed a programming task with access to AI support; in the second,…
AI-powered planning tools show promise in supporting programming learners by enabling early, formative feedback on their thinking processes prior to coding. To date, however, most AI-supported planning tools rely on students'…
Curriculum learning for training LLMs requires a difficulty signal that aligns with reasoning while remaining scalable and interpretable. We propose a simple premise: tasks that demand deeper depth of thought for humans should also be…
In this paper, we investigate code-integrated reasoning, where models generate code when necessary and integrate feedback by executing it through a code interpreter. To acquire this capability, models must learn when and how to use external…
Understanding a program's runtime reasoning behavior, meaning how intermediate states and control flows lead to final execution results, is essential for reliable code generation, debugging, and automated reasoning. Although large language…
Introducing computational thinking in primary school curricula implies that teachers have to prepare appropriate lesson material. Typically this includes creating programming tasks, which may overwhelm primary school teachers with lacking…
The prevalence of online platforms and studies has generated the demand for automated grading tools, and as a result, there are plenty in the market. Such tools are developed to grade coding assignments quickly, accurately, and…
Large language models (LLMs) have scaled up to unlock a wide range of complex reasoning tasks with the aid of various prompting methods. However, current prompting methods generate natural language intermediate steps to help reasoning,…
Automated theorem provers and formal proof assistants are general reasoning systems that are in theory capable of proving arbitrarily hard theorems, thus solving arbitrary problems reducible to mathematics and logical reasoning. In…
Large reasoning models (LRMs) like OpenAI-o1 have shown impressive capabilities in natural language reasoning. However, these models frequently demonstrate inefficiencies or inaccuracies when tackling complex mathematical operations. While…
Recent advancements in Large Language Models (LLMs) have showcased striking results on existing logical reasoning benchmarks, with some models even surpassing human performance. However, the true depth of their competencies and robustness…
Neural algorithmic reasoning is an emerging area of machine learning that focuses on building neural networks capable of solving complex algorithmic tasks. Recent advancements predominantly follow the standard supervised learning paradigm…
Mathematical reasoning---a core ability within human intelligence---presents some unique challenges as a domain: we do not come to understand and solve mathematical problems primarily on the back of experience and evidence, but on the basis…
Generative AI enables students to produce plausible code quickly. Producing working code is therefore no longer a reliable indicator of understanding. This is particularly problematic in non-computer-science programmes, where time…
Reliable clinical decision support requires medical AI agents capable of safe, multi-step reasoning over structured electronic health records (EHRs). While large language models (LLMs) show promise in healthcare, existing benchmarks…
Inductive program synthesis, or programming by example, requires synthesizing functions from input-output examples that generalize to unseen inputs. While large language model agents have shown promise in programming tasks guided by natural…
Generative artificial intelligence poses new challenges around assessment, increasingly driving introductory programming educators to employ invigilated exams. But exams do not afford more authentic programming experiences that involve…