Related papers: AutoDecoding Latent 3D Diffusion Models

Video Autoencoder: self-supervised disentanglement of static 3D structure and motion

A video autoencoder is proposed for learning disentan- gled representations of 3D structure and camera pose from videos in a self-supervised manner. Relying on temporal continuity in videos, our work assumes that the 3D scene structure in…

Computer Vision and Pattern Recognition · Computer Science 2021-10-07 Zihang Lai , Sifei Liu , Alexei A. Efros , Xiaolong Wang

GenAssets: Generating in-the-wild 3D Assets in Latent Space

High-quality 3D assets for traffic participants are critical for multi-sensor simulation, which is essential for the safe end-to-end development of autonomy. Building assets from in-the-wild data is key for diversity and realism, but…

Computer Vision and Pattern Recognition · Computer Science 2026-04-28 Ze Yang , Jingkang Wang , Haowei Zhang , Sivabalan Manivasagam , Yun Chen , Raquel Urtasun

ArtiLatent: Realistic Articulated 3D Object Generation via Structured Latents

We propose ArtiLatent, a generative framework that synthesizes human-made 3D objects with fine-grained geometry, accurate articulation, and realistic appearance. Our approach jointly models part geometry and articulation dynamics by…

Computer Vision and Pattern Recognition · Computer Science 2025-10-27 Honghua Chen , Yushi Lan , Yongwei Chen , Xingang Pan

3D-LDM: Neural Implicit 3D Shape Generation with Latent Diffusion Models

Diffusion models have shown great promise for image generation, beating GANs in terms of generation diversity, with comparable image quality. However, their application to 3D shapes has been limited to point or voxel representations that…

Computer Vision and Pattern Recognition · Computer Science 2022-12-16 Gimin Nam , Mariem Khlifi , Andrew Rodriguez , Alberto Tono , Linqi Zhou , Paul Guerrero

Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion

We present Farm3D, a method for learning category-specific 3D reconstructors for articulated objects, relying solely on "free" virtual supervision from a pre-trained 2D diffusion-based image generator. Recent approaches can learn a…

Computer Vision and Pattern Recognition · Computer Science 2024-05-15 Tomas Jakab , Ruining Li , Shangzhe Wu , Christian Rupprecht , Andrea Vedaldi

SALAD: Part-Level Latent Diffusion for 3D Shape Generation and Manipulation

We present a cascaded diffusion model based on a part-level implicit 3D representation. Our model achieves state-of-the-art generation quality and also enables part-level shape editing and manipulation without any additional training in…

Computer Vision and Pattern Recognition · Computer Science 2024-03-21 Juil Koo , Seungwoo Yoo , Minh Hieu Nguyen , Minhyuk Sung

WildFusion: Learning 3D-Aware Latent Diffusion Models in View Space

Modern learning-based approaches to 3D-aware image synthesis achieve high photorealism and 3D-consistent viewpoint changes for the generated images. Existing approaches represent instances in a shared canonical space. However, for…

Computer Vision and Pattern Recognition · Computer Science 2024-04-15 Katja Schwarz , Seung Wook Kim , Jun Gao , Sanja Fidler , Andreas Geiger , Karsten Kreis

LTM3D: Bridging Token Spaces for Conditional 3D Generation with Auto-Regressive Diffusion Framework

We present LTM3D, a Latent Token space Modeling framework for conditional 3D shape generation that integrates the strengths of diffusion and auto-regressive (AR) models. While diffusion-based methods effectively model continuous latent…

Computer Vision and Pattern Recognition · Computer Science 2025-06-02 Xin Kang , Zihan Zheng , Lei Chu , Yue Gao , Jiahao Li , Hao Pan , Xuejin Chen , Yan Lu

LN3DIFF++: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a unified 3D diffusion pipeline remains unsettled.…

Computer Vision and Pattern Recognition · Computer Science 2025-12-22 Yushi Lan , Fangzhou Hong , Shangchen Zhou , Shuai Yang , Xuyi Meng , Yongwei Chen , Zhaoyang Lyu , Bo Dai , Xingang Pan , Chen Change Loy

3DifFusionDet: Diffusion Model for 3D Object Detection with Robust LiDAR-Camera Fusion

Good 3D object detection performance from LiDAR-Camera sensors demands seamless feature alignment and fusion strategies. We propose the 3DifFusionDet framework in this paper, which structures 3D object detection as a denoising diffusion…

Computer Vision and Pattern Recognition · Computer Science 2023-11-08 Xinhao Xiang , Simon Dräger , Jiawei Zhang

Latent Diffusion for Language Generation

Diffusion models have achieved great success in modeling continuous data modalities such as images, audio, and video, but have seen limited use in discrete domains such as language. Recent attempts to adapt diffusion to language have…

Computation and Language · Computer Science 2023-11-08 Justin Lovelace , Varsha Kishore , Chao Wan , Eliot Shekhtman , Kilian Q. Weinberger

Latent feature disentanglement for 3D meshes

Generative modeling of 3D shapes has become an important problem due to its relevance to many applications across Computer Vision, Graphics, and VR. In this paper we build upon recently introduced 3D mesh-convolutional Variational…

Machine Learning · Computer Science 2019-06-11 Jake Levinson , Avneesh Sud , Ameesh Makadia

LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion

This paper introduces a novel hierarchical autoencoder that maps 3D models into a highly compressed latent space. The hierarchical autoencoder is specifically designed to tackle the challenges arising from large-scale datasets and…

Computer Vision and Pattern Recognition · Computer Science 2024-10-03 Biao Zhang , Peter Wonka

Diffusion-based 3D Object Detection with Random Boxes

3D object detection is an essential task for achieving autonomous driving. Existing anchor-based detection methods rely on empirical heuristics setting of anchors, which makes the algorithms lack elegance. In recent years, we have witnessed…

Computer Vision and Pattern Recognition · Computer Science 2023-09-06 Xin Zhou , Jinghua Hou , Tingting Yao , Dingkang Liang , Zhe Liu , Zhikang Zou , Xiaoqing Ye , Jianwei Cheng , Xiang Bai

RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation

Diffusion models currently achieve state-of-the-art performance for both conditional and unconditional image generation. However, so far, image diffusion models do not support tasks required for 3D understanding, such as view-consistent 3D…

Computer Vision and Pattern Recognition · Computer Science 2024-02-22 Titas Anciukevičius , Zexiang Xu , Matthew Fisher , Paul Henderson , Hakan Bilen , Niloy J. Mitra , Paul Guerrero

Velox: Learning Representations of 4D Geometry and Appearance

We introduce a framework for learning latent representations of 4D objects which are descriptive, faithfully capturing object geometry and appearance; compressive, aiding in downstream efficiency; and accessible, requiring minimal input,…

Computer Vision and Pattern Recognition · Computer Science 2026-05-07 Anagh Malik , Dorian Chan , Xiaoming Zhao , David B. Lindell , Oncel Tuzel , Jen-Hao Rick Chang

Learning Coherent Matrixized Representation in Latent Space for Volumetric 4D Generation

Directly learning to model 4D content, including shape, color, and motion, is challenging. Existing methods rely on pose priors for motion control, resulting in limited motion diversity and continuity in details. To address this, we propose…

Computer Vision and Pattern Recognition · Computer Science 2025-06-17 Qitong Yang , Mingtao Feng , Zijie Wu , Shijie Sun , Weisheng Dong , Yaonan Wang , Ajmal Mian

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained for 2D image synthesis have been turned into…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Andreas Blattmann , Tim Dockhorn , Sumith Kulal , Daniel Mendelevitch , Maciej Kilian , Dominik Lorenz , Yam Levi , Zion English , Vikram Voleti , Adam Letts , Varun Jampani , Robin Rombach

LaMD: Latent Motion Diffusion for Image-Conditional Video Generation

The video generation field has witnessed rapid improvements with the introduction of recent diffusion models. While these models have successfully enhanced appearance quality, they still face challenges in generating coherent and natural…

Computer Vision and Pattern Recognition · Computer Science 2025-04-21 Yaosi Hu , Zhenzhong Chen , Chong Luo

VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder

This paper introduces a pioneering 3D volumetric encoder designed for text-to-3D generation. To scale up the training data for the diffusion model, a lightweight network is developed to efficiently acquire feature volumes from multi-view…

Computer Vision and Pattern Recognition · Computer Science 2024-08-14 Zhicong Tang , Shuyang Gu , Chunyu Wang , Ting Zhang , Jianmin Bao , Dong Chen , Baining Guo