Related papers: Reasoning-Based Software Testing
Search-based Software Testing (SBST) can automatically generate test cases to search for requirements violations. Unlike manual test case development, it can generate a substantial number of test cases in a limited time. However, SBST does…
Software testing is a complex, intellectual activity based (at least) on analysis, reasoning, decision making, abstraction and collaboration performed in a highly demanding environment. Naturally, it uses and allocates multiple cognitive…
Recent Large Reasoning Models (LRMs) have achieved remarkable progress on task-specific benchmarks, yet their evaluation methods remain constrained by isolated problem-solving paradigms. Existing benchmarks predominantly assess…
Search-based software testing (SBST) is now a mature area, with numerous techniques developed to tackle the challenging task of software testing. SBST techniques have shown promising results and have been successfully applied in the…
Automated software testing has significant potential to enhance efficiency and reliability within software development processes. However, its broader adoption faces considerable challenges, particularly concerning alignment between test…
Causal inference is a study of causal relationships between events and the statistical study of inferring these relationships through interventions and other statistical techniques. Causal reasoning is any line of work toward determining…
The language-conditioned robotic manipulation aims to transfer natural language instructions into executable actions, from simple pick-and-place to tasks requiring intent recognition and visual reasoning. Inspired by the dual process theory…
Generating step-by-step "chain-of-thought" rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently…
Background: Software systems powered by large language models are becoming a routine part of everyday technologies, supporting applications across a wide range of domains. In software engineering, many studies have focused on how LLMs…
The prevailing paradigm for training large reasoning models--combining Supervised Fine-Tuning (SFT) with Reinforcement Learning with Verifiable Rewards (RLVR)--is fundamentally constrained by its reliance on high-quality, human-annotated…
Since its beginnings in the 1940s, automated reasoning by computers has become a tool of ever growing importance in scientific research. So far, the rules underlying automated reasoning have mainly been formulated by humans, in the form of…
Inductive reasoning, a cornerstone of human cognition, enables generalization from limited data but hasn't yet been fully achieved by large language models (LLMs). While modern LLMs excel at reasoning tasks, their ability to maintain stable…
Large language models show improved downstream task performance when prompted to generate step-by-step reasoning to justify their final answers. These reasoning steps greatly improve model interpretability and verification, but objectively…
Testing autonomous robotic systems, such as self-driving cars and unmanned aerial vehicles, is challenging due to their interaction with highly unpredictable environments. A common practice is to first conduct simulation-based testing,…
Intelligent software systems continuously analyze their surrounding environment and accordingly adapt their internal state. Depending on the criticality index of the situation, the system should dynamically focus or widen its analysis and…
Recent advancements in large language models (LLMs) underscore the need for stronger reasoning capabilities to solve complex problems effectively. While Chain-of-Thought (CoT) reasoning has been a step forward, it remains insufficient for…
Reasoning is a hallmark of human intelligence, enabling adaptive decision-making in complex and unfamiliar scenarios. In contrast, machine intelligence remains bound to training data, lacking the ability to dynamically refine solutions at…
The reasoning abilities of large language models (LLMs) have improved with chain-of-thought (CoT) prompting, allowing models to solve complex tasks stepwise. However, training CoT capabilities requires detailed reasoning data, which is…
Background: Research software plays an important role in solving real-life problems, empowering scientific innovations, and handling emergency situations. Therefore, the correctness and trustworthiness of research software are of absolute…
Reward models play a critical role in guiding large language models toward outputs that align with human expectations. However, an open challenge remains in effectively utilizing test-time compute to enhance reward model performance. In…