Related papers: Multimodal Controller for Generative Models

Multi-View Data Generation Without View Supervision

The development of high-dimensional generative models has recently gained a great surge of interest with the introduction of variational auto-encoders and generative adversarial neural networks. Different variants have been proposed where…

Computer Vision and Pattern Recognition · Computer Science 2019-04-18 Mickaël Chen , Ludovic Denoyer , Thierry Artières

Multimodal Generative Models for Compositional Representation Learning

As deep neural networks become more adept at traditional tasks, many of the most exciting new challenges concern multimodality---observations that combine diverse types, such as image and text. In this paper, we introduce a family of…

Machine Learning · Computer Science 2019-12-12 Mike Wu , Noah Goodman

Controllable Image Generation via Collage Representations

Recent advances in conditional generative image models have enabled impressive results. On the one hand, text-based conditional models have achieved remarkable generation quality, by leveraging large-scale datasets of image-text pairs. To…

Computer Vision and Pattern Recognition · Computer Science 2023-04-27 Arantxa Casanova , Marlène Careil , Adriana Romero-Soriano , Christopher J. Pal , Jakob Verbeek , Michal Drozdzal

Plug-and-Play Controllable Generation for Discrete Masked Models

This article makes discrete masked models for the generative modeling of discrete data controllable. The goal is to generate samples of a discrete random variable that adheres to a posterior distribution, satisfies specific constraints, or…

Machine Learning · Computer Science 2024-10-04 Wei Guo , Yuchen Zhu , Molei Tao , Yongxin Chen

MultiModal Action Conditioned Video Generation

Current video models fail as world model as they lack fine-graiend control. General-purpose household robots require real-time fine motor control to handle delicate tasks and urgent situations. In this work, we introduce fine-grained…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Yichen Li , Antonio Torralba

Discriminative Multimodal Learning via Conditional Priors in Generative Models

Deep generative models with latent variables have been used lately to learn joint representations and generative processes from multi-modal data. These two learning mechanisms can, however, conflict with each other and representations can…

Machine Learning · Computer Science 2023-01-24 Rogelio A. Mancisidor , Michael Kampffmeyer , Kjersti Aas , Robert Jenssen

Conditional WaveGAN

Generative models are successfully used for image synthesis in the recent years. But when it comes to other modalities like audio, text etc little progress has been made. Recent works focus on generating audio from a generative model in an…

Computer Vision and Pattern Recognition · Computer Science 2018-09-30 Chae Young Lee , Anoop Toffy , Gue Jun Jung , Woo-Jin Han

Diverse Image Generation via Self-Conditioned GANs

We introduce a simple but effective unsupervised method for generating realistic and diverse images. We train a class-conditional GAN model without using manually annotated class labels. Instead, our model is conditional on labels…

Computer Vision and Pattern Recognition · Computer Science 2022-02-11 Steven Liu , Tongzhou Wang , David Bau , Jun-Yan Zhu , Antonio Torralba

ControlVAR: Exploring Controllable Visual Autoregressive Modeling

Conditional visual generation has witnessed remarkable progress with the advent of diffusion models (DMs), especially in tasks like control-to-image generation. However, challenges such as expensive computational cost, high inference…

Computer Vision and Pattern Recognition · Computer Science 2024-10-03 Xiang Li , Kai Qiu , Hao Chen , Jason Kuen , Zhe Lin , Rita Singh , Bhiksha Raj

A survey of multimodal deep generative models

Multimodal learning is a framework for building models that make predictions based on different types of modalities. Important challenges in multimodal learning are the inference of shared representations from arbitrary modalities and…

Machine Learning · Computer Science 2022-07-06 Masahiro Suzuki , Yutaka Matsuo

Self-control: A Better Conditional Mechanism for Masked Autoregressive Model

Autoregressive conditional image generation algorithms are capable of generating photorealistic images that are consistent with given textual or image conditions, and have great potential for a wide range of applications. Nevertheless, the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-19 Qiaoying Qu , Shiyu Shen

Multimodal Generative Models for Scalable Weakly-Supervised Learning

Multiple modalities often co-occur when describing natural phenomena. Learning a joint representation of these modalities should yield deeper and more useful representations. Previous generative approaches to multi-modal input either do not…

Machine Learning · Computer Science 2018-11-13 Mike Wu , Noah Goodman

Molecular generative model based on conditional variational autoencoder for de novo molecular design

We propose a molecular generative model based on the conditional variational autoencoder for de novo molecular design. It is specialized to control multiple molecular properties simultaneously by imposing them on a latent space. As a proof…

Machine Learning · Computer Science 2018-06-18 Jaechang Lim , Seongok Ryu , Jin Woo Kim , Woo Youn Kim

MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation

We introduce MVControl, a novel neural network architecture that enhances existing pre-trained multi-view 2D diffusion models by incorporating additional input conditions, e.g. edge maps. Our approach enables the generation of controllable…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Zhiqi Li , Yiming Chen , Lingzhe Zhao , Peidong Liu

Generative Multi-modal Models are Good Class-Incremental Learners

In class-incremental learning (CIL) scenarios, the phenomenon of catastrophic forgetting caused by the classifier's bias towards the current task has long posed a significant challenge. It is mainly caused by the characteristic of…

Computer Vision and Pattern Recognition · Computer Science 2024-03-28 Xusheng Cao , Haori Lu , Linlan Huang , Xialei Liu , Ming-Ming Cheng

Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models

Multimodal learning for generative models often refers to the learning of abstract concepts from the commonality of information in multiple modalities, such as vision and language. While it has proven effective for learning generalisable…

Machine Learning · Computer Science 2021-04-22 Yuge Shi , Brooks Paige , Philip H. S. Torr , N. Siddharth

CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers

We extend multimodal transformers to include 3D camera motion as a conditioning signal for the task of video generation. Generative video models are becoming increasingly powerful, thus focusing research efforts on methods of controlling…

Computer Vision and Pattern Recognition · Computer Science 2024-05-24 Andrew Marmon , Grant Schindler , José Lezama , Dan Kondratyuk , Bryan Seybold , Irfan Essa

Joint Multimodal Learning with Deep Generative Models

We investigate deep generative models that can exchange multiple modalities bi-directionally, e.g., generating images from corresponding texts and vice versa. Recently, some studies handle multiple modalities on deep generative models, such…

Machine Learning · Statistics 2016-11-08 Masahiro Suzuki , Kotaro Nakayama , Yutaka Matsuo

Efficient Conditional Generation on Scale-based Visual Autoregressive Models

Recent advances in autoregressive (AR) models have demonstrated their potential to rival diffusion models in image synthesis. However, for complex spatially-conditioned generation, current AR approaches rely on fine-tuning the pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2025-10-08 Jiaqi Liu , Tao Huang , Chang Xu

Variational Conditional GAN for Fine-grained Controllable Image Generation

In this paper, we propose a novel variational generator framework for conditional GANs to catch semantic details for improving the generation quality and diversity. Traditional generators in conditional GANs simply concatenate the…

Computer Vision and Pattern Recognition · Computer Science 2019-09-24 Mingqi Hu , Deyu Zhou , Yulan He