Related papers: KnowDiffuser: A Knowledge-Guided Diffusion Planner…

LAD-Drive: Bridging Language and Trajectory with Action-Aware Diffusion Transformers

While multimodal large language models (MLLMs) provide advanced reasoning for autonomous driving, translating their discrete semantic knowledge into continuous trajectories remains a fundamental challenge. Existing methods often rely on…

Robotics · Computer Science 2026-03-03 Fabian Schmidt , Karol Fedurko , Markus Enzweiler , Abhinav Valada

Diffuse Thinking: Exploring Diffusion Language Models as Efficient Thought Proposers for Reasoning

In recent years, large language models (LLMs) have witnessed remarkable advancements, with the test-time scaling law consistently enhancing the reasoning capabilities. Through systematic evaluation and exploration of a diverse spectrum of…

Computation and Language · Computer Science 2025-11-03 Chenyang Shao , Sijian Ren , Fengli Xu , Yong Li

dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning

The autonomous driving community is increasingly focused on addressing the challenges posed by out-of-distribution (OOD) driving scenarios. A dominant research trend seeks to enhance end-to-end (E2E) driving systems by integrating…

Computer Vision and Pattern Recognition · Computer Science 2025-12-05 Yingzi Ma , Yulong Cao , Wenhao Ding , Shuibai Zhang , Yan Wang , Boris Ivanovic , Ming Jiang , Marco Pavone , Chaowei Xiao

Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models

We present DiffExplainer, a novel framework that, leveraging language-vision models, enables multimodal global explainability. DiffExplainer employs diffusion models conditioned on optimized text prompts, synthesizing images that maximize…

Computer Vision and Pattern Recognition · Computer Science 2024-04-04 Matteo Pennisi , Giovanni Bellitto , Simone Palazzo , Mubarak Shah , Concetto Spampinato

I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

This paper presents ThinkDiff, a novel alignment paradigm that empowers text-to-image diffusion models with multimodal in-context understanding and reasoning capabilities by integrating the strengths of vision-language models (VLMs).…

Machine Learning · Computer Science 2025-02-18 Zhenxing Mi , Kuan-Chieh Wang , Guocheng Qian , Hanrong Ye , Runtao Liu , Sergey Tulyakov , Kfir Aberman , Dan Xu

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text. Several methods alleviated this issue by incorporating explicit text position and content as guidance on…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Jingye Chen , Yupan Huang , Tengchao Lv , Lei Cui , Qifeng Chen , Furu Wei

Drive As You Like: Strategy-Level Motion Planning Based on A Multi-Head Diffusion Model

Recent advances in motion planning for autonomous driving have led to models capable of generating high-quality trajectories. However, most existing planners tend to fix their policy after supervised training, leading to consistent but…

Robotics · Computer Science 2025-08-26 Fan Ding , Xuewen Luo , Hwa Hui Tew , Ruturaj Reddy , Xikun Wang , Junn Yong Loo

TransDiffuser: Diverse Trajectory Generation with Decorrelated Multi-modal Representation for End-to-end Autonomous Driving

In recent years, diffusion models have demonstrated remarkable potential across diverse domains, from vision generation to language modeling. Transferring its generative capabilities to modern end-to-end autonomous driving systems has also…

Robotics · Computer Science 2025-09-17 Xuefeng Jiang , Yuan Ma , Pengxiang Li , Leimeng Xu , Xin Wen , Kun Zhan , Zhongpu Xia , Peng Jia , Xianpeng Lang , Sheng Sun

A Cheaper and Better Diffusion Language Model with Soft-Masked Noise

Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation. Whereas, as a way inherently built for continuous data, existing diffusion models still have…

Computation and Language · Computer Science 2023-04-11 Jiaao Chen , Aston Zhang , Mu Li , Alex Smola , Diyi Yang

CoPlanner: An Interactive Motion Planner with Contingency-Aware Diffusion for Autonomous Driving

Accurate trajectory prediction and motion planning are crucial for autonomous driving systems to navigate safely in complex, interactive environments characterized by multimodal uncertainties. However, current generation-then-evaluation…

Robotics · Computer Science 2025-09-23 Ruiguo Zhong , Ruoyu Yao , Pei Liu , Xiaolong Chen , Rui Yang , Jun Ma

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

While recent Multimodal Large Language Models (MLLMs) have attained significant strides in multimodal reasoning, their reasoning processes remain predominantly text-centric, leading to suboptimal performance in complex long-horizon,…

Computer Vision and Pattern Recognition · Computer Science 2026-01-01 Zefeng He , Xiaoye Qu , Yafu Li , Tong Zhu , Siyuan Huang , Yu Cheng

Temporally Decoupled Diffusion Planning for Autonomous Driving

Motion planning in dynamic urban environments requires balancing immediate safety with long-term goals. While diffusion models effectively capture multi-modal decision-making, existing approaches treat trajectories as monolithic entities,…

Robotics · Computer Science 2026-03-27 Xiang Li , Bikun Wang , John Zhang , Jianjun Wang

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Diffusion models have demonstrated strong potential for robotic trajectory planning. However, generating coherent trajectories from high-level instructions remains challenging, especially for long-range composition tasks requiring multiple…

Robotics · Computer Science 2024-03-29 Zhixuan Liang , Yao Mu , Hengbo Ma , Masayoshi Tomizuka , Mingyu Ding , Ping Luo

M2Diffuser: Diffusion-based Trajectory Optimization for Mobile Manipulation in 3D Scenes

Recent advances in diffusion models have opened new avenues for research into embodied AI agents and robotics. Despite significant achievements in complex robotic locomotion and skills, mobile manipulation-a capability that requires the…

Robotics · Computer Science 2025-04-03 Sixu Yan , Zeyu Zhang , Muzhi Han , Zaijin Wang , Qi Xie , Zhitian Li , Zhehan Li , Hangxin Liu , Xinggang Wang , Song-Chun Zhu

Diffusion-Based Planning for Autonomous Driving with Flexible Guidance

Achieving human-like driving behaviors in complex open-world environments is a critical challenge in autonomous driving. Contemporary learning-based planning approaches such as imitation learning methods often struggle to balance competing…

Robotics · Computer Science 2025-02-11 Yinan Zheng , Ruiming Liang , Kexin Zheng , Jinliang Zheng , Liyuan Mao , Jianxiong Li , Weihao Gu , Rui Ai , Shengbo Eben Li , Xianyuan Zhan , Jingjing Liu

Diffusion-CAM: Faithful Visual Explanations for dMLLMs

While diffusion Multimodal Large Language Models (dMLLMs) have recently achieved remarkable strides in multimodal generation, the development of interpretability mechanisms has lagged behind their architectural evolution. Unlike traditional…

Artificial Intelligence · Computer Science 2026-04-14 Haomin Zuo , Yidi Li , Luoxiao Yang , Xiaofeng Zhang

GenPlanner: From Noise to Plans -- Emergent Reasoning in Flow Matching and Diffusion Models

Path planning in complex environments is one of the key problems of artificial intelligence because it requires simultaneous understanding of the geometry of space and the global structure of the problem. In this paper, we explore the…

Artificial Intelligence · Computer Science 2026-02-24 Agnieszka Polowczyk , Alicja Polowczyk , Michał Wieczorek

A Survey on Diffusion Language Models

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent…

Computation and Language · Computer Science 2025-12-08 Tianyi Li , Mingda Chen , Bowei Guo , Zhiqiang Shen

VDLM: Variable Diffusion LMs via Robust Latent-to-Text Rendering

Autoregressive language models decode left-to-right with irreversible commitments, limiting revision during multi-step reasoning. We propose \textbf{VDLM}, a modular variable diffusion language model that separates semantic planning from…

Computation and Language · Computer Science 2026-02-19 Shuhui Qu

Towards Latent Diffusion Suitable For Text

Language diffusion models aim to improve sampling speed and coherence over autoregressive LLMs. We introduce Neural Flow Diffusion Models for language generation, an extension of NFDM that enables the straightforward application of…

Computation and Language · Computer Science 2026-01-26 Nesta Midavaine , Christian A. Naesseth , Grigory Bartosh