English
Related papers

Related papers: KnowDiffuser: A Knowledge-Guided Diffusion Planner…

200 papers

While multimodal large language models (MLLMs) provide advanced reasoning for autonomous driving, translating their discrete semantic knowledge into continuous trajectories remains a fundamental challenge. Existing methods often rely on…

Robotics · Computer Science 2026-03-03 Fabian Schmidt , Karol Fedurko , Markus Enzweiler , Abhinav Valada

In recent years, large language models (LLMs) have witnessed remarkable advancements, with the test-time scaling law consistently enhancing the reasoning capabilities. Through systematic evaluation and exploration of a diverse spectrum of…

Computation and Language · Computer Science 2025-11-03 Chenyang Shao , Sijian Ren , Fengli Xu , Yong Li

The autonomous driving community is increasingly focused on addressing the challenges posed by out-of-distribution (OOD) driving scenarios. A dominant research trend seeks to enhance end-to-end (E2E) driving systems by integrating…

Computer Vision and Pattern Recognition · Computer Science 2025-12-05 Yingzi Ma , Yulong Cao , Wenhao Ding , Shuibai Zhang , Yan Wang , Boris Ivanovic , Ming Jiang , Marco Pavone , Chaowei Xiao

We present DiffExplainer, a novel framework that, leveraging language-vision models, enables multimodal global explainability. DiffExplainer employs diffusion models conditioned on optimized text prompts, synthesizing images that maximize…

Computer Vision and Pattern Recognition · Computer Science 2024-04-04 Matteo Pennisi , Giovanni Bellitto , Simone Palazzo , Mubarak Shah , Concetto Spampinato

This paper presents ThinkDiff, a novel alignment paradigm that empowers text-to-image diffusion models with multimodal in-context understanding and reasoning capabilities by integrating the strengths of vision-language models (VLMs).…

Machine Learning · Computer Science 2025-02-18 Zhenxing Mi , Kuan-Chieh Wang , Guocheng Qian , Hanrong Ye , Runtao Liu , Sergey Tulyakov , Kfir Aberman , Dan Xu

The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text. Several methods alleviated this issue by incorporating explicit text position and content as guidance on…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Jingye Chen , Yupan Huang , Tengchao Lv , Lei Cui , Qifeng Chen , Furu Wei

Recent advances in motion planning for autonomous driving have led to models capable of generating high-quality trajectories. However, most existing planners tend to fix their policy after supervised training, leading to consistent but…

Robotics · Computer Science 2025-08-26 Fan Ding , Xuewen Luo , Hwa Hui Tew , Ruturaj Reddy , Xikun Wang , Junn Yong Loo

In recent years, diffusion models have demonstrated remarkable potential across diverse domains, from vision generation to language modeling. Transferring its generative capabilities to modern end-to-end autonomous driving systems has also…

Robotics · Computer Science 2025-09-17 Xuefeng Jiang , Yuan Ma , Pengxiang Li , Leimeng Xu , Xin Wen , Kun Zhan , Zhongpu Xia , Peng Jia , Xianpeng Lang , Sheng Sun

Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation. Whereas, as a way inherently built for continuous data, existing diffusion models still have…

Computation and Language · Computer Science 2023-04-11 Jiaao Chen , Aston Zhang , Mu Li , Alex Smola , Diyi Yang

Accurate trajectory prediction and motion planning are crucial for autonomous driving systems to navigate safely in complex, interactive environments characterized by multimodal uncertainties. However, current generation-then-evaluation…

Robotics · Computer Science 2025-09-23 Ruiguo Zhong , Ruoyu Yao , Pei Liu , Xiaolong Chen , Rui Yang , Jun Ma

While recent Multimodal Large Language Models (MLLMs) have attained significant strides in multimodal reasoning, their reasoning processes remain predominantly text-centric, leading to suboptimal performance in complex long-horizon,…

Computer Vision and Pattern Recognition · Computer Science 2026-01-01 Zefeng He , Xiaoye Qu , Yafu Li , Tong Zhu , Siyuan Huang , Yu Cheng

Motion planning in dynamic urban environments requires balancing immediate safety with long-term goals. While diffusion models effectively capture multi-modal decision-making, existing approaches treat trajectories as monolithic entities,…

Robotics · Computer Science 2026-03-27 Xiang Li , Bikun Wang , John Zhang , Jianjun Wang

Diffusion models have demonstrated strong potential for robotic trajectory planning. However, generating coherent trajectories from high-level instructions remains challenging, especially for long-range composition tasks requiring multiple…

Robotics · Computer Science 2024-03-29 Zhixuan Liang , Yao Mu , Hengbo Ma , Masayoshi Tomizuka , Mingyu Ding , Ping Luo

Recent advances in diffusion models have opened new avenues for research into embodied AI agents and robotics. Despite significant achievements in complex robotic locomotion and skills, mobile manipulation-a capability that requires the…

Robotics · Computer Science 2025-04-03 Sixu Yan , Zeyu Zhang , Muzhi Han , Zaijin Wang , Qi Xie , Zhitian Li , Zhehan Li , Hangxin Liu , Xinggang Wang , Song-Chun Zhu

Achieving human-like driving behaviors in complex open-world environments is a critical challenge in autonomous driving. Contemporary learning-based planning approaches such as imitation learning methods often struggle to balance competing…

While diffusion Multimodal Large Language Models (dMLLMs) have recently achieved remarkable strides in multimodal generation, the development of interpretability mechanisms has lagged behind their architectural evolution. Unlike traditional…

Artificial Intelligence · Computer Science 2026-04-14 Haomin Zuo , Yidi Li , Luoxiao Yang , Xiaofeng Zhang

Path planning in complex environments is one of the key problems of artificial intelligence because it requires simultaneous understanding of the geometry of space and the global structure of the problem. In this paper, we explore the…

Artificial Intelligence · Computer Science 2026-02-24 Agnieszka Polowczyk , Alicja Polowczyk , Michał Wieczorek

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent…

Computation and Language · Computer Science 2025-12-08 Tianyi Li , Mingda Chen , Bowei Guo , Zhiqiang Shen

Autoregressive language models decode left-to-right with irreversible commitments, limiting revision during multi-step reasoning. We propose \textbf{VDLM}, a modular variable diffusion language model that separates semantic planning from…

Computation and Language · Computer Science 2026-02-19 Shuhui Qu

Language diffusion models aim to improve sampling speed and coherence over autoregressive LLMs. We introduce Neural Flow Diffusion Models for language generation, an extension of NFDM that enables the straightforward application of…

Computation and Language · Computer Science 2026-01-26 Nesta Midavaine , Christian A. Naesseth , Grigory Bartosh
‹ Prev 1 2 3 10 Next ›