Related papers: Learning How to Cube

Learning Neural-Symbolic Descriptive Planning Models via Cube-Space Priors: The Voyage Home (to STRIPS)

We achieved a new milestone in the difficult task of enabling agents to learn about their environment autonomously. Our neuro-symbolic architecture is trained end-to-end to produce a succinct and effective discrete state transition model…

Artificial Intelligence · Computer Science 2020-08-13 Masataro Asai , Christian Muise

Smart Cubing for Graph Search: A Comparative Study

Parallel solving via cube-and-conquer is a key method for scaling SAT solvers to hard instances. While cube-and-conquer has proven successful for pure SAT problems, notably the Pythagorean triples conjecture, its application to SAT solvers…

Artificial Intelligence · Computer Science 2025-01-30 Markus Kirchweger , Hai Xia , Tomáš Peitl , Stefan Szeider

Assessing Robustness to Spurious Correlations in Post-Training Language Models

Supervised and preference-based fine-tuning techniques have become popular for aligning large language models (LLMs) with user intent and correctness criteria. However, real-world training data often exhibits spurious correlations --…

Computation and Language · Computer Science 2025-05-12 Julia Shuieh , Prasann Singhal , Apaar Shanker , John Heyer , George Pu , Samuel Denton

Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections

Post-training processes are essential phases in grounding pre-trained language models to real-world tasks, with learning from demonstrations or preference signals playing a crucial role in this adaptation. We present a unified theoretical…

Machine Learning · Computer Science 2025-07-08 Bo Wang , Qinyuan Cheng , Runyu Peng , Rong Bao , Peiji Li , Qipeng Guo , Linyang Li , Zhiyuan Zeng , Yunhua Zhou , Xipeng Qiu

Machine Learning for SAT: Restricted Heuristics and New Graph Representations

Boolean satisfiability (SAT) is a fundamental NP-complete problem with many applications, including automated planning and scheduling. To solve large instances, SAT solvers have to rely on heuristics, e.g., choosing a branching variable in…

Artificial Intelligence · Computer Science 2023-07-19 Mikhail Shirokikh , Ilya Shenbin , Anton Alekseev , Sergey Nikolenko

Concurrent Cube-and-Conquer

Recent work introduced the cube-and-conquer technique to solve hard SAT instances. It partitions the search space into cubes using a lookahead solver. Each cube is tackled by a conflict-driven clause learning (CDCL) solver. Crucial for…

Data Structures and Algorithms · Computer Science 2014-02-19 Peter van der Tak , Marijn J. H. Heule , Armin Biere

From Gameplay to Symbolic Reasoning: Learning SAT Solver Heuristics in the Style of Alpha(Go) Zero

Despite the recent successes of deep neural networks in various fields such as image and speech recognition, natural language processing, and reinforcement learning, we still face big challenges in bringing the power of numeric optimization…

Artificial Intelligence · Computer Science 2018-02-16 Fei Wang , Tiark Rompf

Revisiting the Data Sampling in Multimodal Post-training from a Difficulty-Distinguish View

Recent advances in Multimodal Large Language Models (MLLMs) have spurred significant progress in Chain-of-Thought (CoT) reasoning. Building on the success of Deepseek-R1, researchers extended multimodal reasoning to post-training paradigms…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Jianyu Qi , Ding Zou , Wenrui Yan , Rui Ma , Jiaxu Li , Zhijie Zheng , Zhiguo Yang , Rongchang Zhao

Procedural-skill SFT across capacity tiers: A W-Shaped pre-SFT Trajectory and Regime-Asymmetric Mechanism on 0.8B-4B Qwen3.5 Models

We measure procedural-skill SFT contribution across three Qwen3.5 dense scales (0.8B, 2B, 4B) on a 200-task / 40-skill holdout, with Claude Haiku 4.5 as a frontier reference. The corpus is 353 rows of (task + procedural-skill block, Opus…

Machine Learning · Computer Science 2026-05-15 Igor Strozzi

Scaling Neuro-symbolic Problem Solving: Solver-Free Learning of Constraints and Objectives

In the ongoing quest for hybridizing discrete reasoning with neural nets, there is an increasing interest in neural architectures that can learn how to solve discrete reasoning or optimization problems from natural inputs, a task that Large…

Artificial Intelligence · Computer Science 2025-12-19 Marianne Defresne , Romain Gambardella , Sophie Barbe , Thomas Schiex

An Improved Quantum Software Challenges Classification Approach using Transfer Learning and Explainable AI

Quantum Software Engineering (QSE) is a research area practiced by tech firms. Quantum developers face challenges in optimizing quantum computing and QSE concepts. They use Stack Overflow (SO) to discuss challenges and label posts with…

Software Engineering · Computer Science 2026-04-15 Nek Dil Khan , Javed Ali Khan , Mobashir Husain , Muhammad Sohail Khan , Arif Ali Khan , Muhammad Azeem Akbar , Shahid Hussain

SAT Strikes Back: Parameter and Path Relations in Quantum Toolchains

In the foreseeable future, toolchains for quantum computing should offer automatic means of transforming a high level problem formulation down to a hardware executable form. Thereby, it is crucial to find (multiple) transformation paths…

Quantum Physics · Physics 2025-10-13 Lukas Schmidbauer , Wolfgang Mauerer

Cubing for Tuning

We are exploring the problem of building an automated reasoning procedure that adaptively tunes the high-level solving strategy for a given problem. There are two main distinctive characteristics of our approach: tuning is performed solely…

Logic in Computer Science · Computer Science 2025-05-14 Haoze Wu , Clark Barrett , Nina Narodytska

From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning

Fine-tuning language models on tasks with instructions has demonstrated potential in facilitating zero-shot generalization to unseen tasks. In this paper, we introduce a straightforward yet effective method for enhancing instruction tuning…

Computation and Language · Computer Science 2023-04-18 Qian Liu , Fan Zhou , Zhengbao Jiang , Longxu Dou , Min Lin

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Supervised fine-tuning (SFT) is a fundamental post-training strategy to align Large Language Models (LLMs) with human intent. However, traditional SFT often ignores the one-to-many nature of language by forcing alignment with a single…

Computation and Language · Computer Science 2026-05-07 Tao Liu , Taiqiang Wu , Runming Yang , Shaoning Sun , Junjie Wang , Yujiu Yang

Deep Symbolic Optimization for Combinatorial Optimization: Accelerating Node Selection by Discovering Potential Heuristics

Combinatorial optimization (CO) is one of the most fundamental mathematical models in real-world applications. Traditional CO solvers, such as Branch-and-Bound (B&B) solvers, heavily rely on expert-designed heuristics, which are reliable…

Machine Learning · Computer Science 2024-07-11 Hongyu Liu , Haoyang Liu , Yufei Kuang , Jie Wang , Bin Li

Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages

Mathematical reasoning remains a challenging area for large language models (LLMs), prompting the development of math-specific LLMs such as LLEMMA, DeepSeekMath, and Qwen2-Math, among others. These models typically follow a two-stage…

Computation and Language · Computer Science 2025-03-25 Zui Chen , Tianqiao Liu , Mi Tian , Qing Tong , Weiqi Luo , Zitao Liu

Hard Negative Sample-Augmented DPO Post-Training for Small Language Models

Large language models (LLMs) continue to struggle with mathematical reasoning, and common post-training pipelines often reduce each generated solution to a binary outcome: correct or incorrect. This perspective is limiting in practice, as…

Machine Learning · Computer Science 2026-04-15 Haocheng Lu , Minjun Zhu , Henry Yu

Too Correct to Learn: Reinforcement Learning on Saturated Reasoning Data

Reinforcement Learning (RL) enhances LLM reasoning, yet a paradox emerges as models scale: strong base models saturate standard benchmarks (e.g., MATH), yielding correct but homogeneous solutions. In such environments, the lack of failure…

Machine Learning · Computer Science 2026-04-21 Zhenwen Liang , Yujun Zhou , Sidi Lu , Xiangliang Zhang , Haitao Mi , Dong Yu

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Supervised Fine-Tuning (SFT) is commonly used to train language models to imitate annotated responses for given instructions. In this paper, we propose Critique Fine-Tuning (CFT), a method more effective than SFT for reasoning tasks.…

Computation and Language · Computer Science 2025-04-01 Yubo Wang , Xiang Yue , Wenhu Chen