Related papers: Towards Device Efficient Conditional Image Generat…

Conditional Image Generation with Pretrained Generative Model

In recent years, diffusion models have gained popularity for their ability to generate higher-quality images in comparison to GAN models. However, like any other large generative models, these models require a huge amount of data,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-21 Rajesh Shrestha , Bowen Xie

Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation

We investigate methods to reduce inference time and memory footprint in stable diffusion models by introducing lightweight decoders for both image and video synthesis. Traditional latent diffusion pipelines rely on large Variational…

Computer Vision and Pattern Recognition · Computer Science 2025-03-10 Alexey Buzovkin , Evgeny Shilov

Conditional Generative Modeling for Images, 3D Animations, and Video

This dissertation attempts to drive innovation in the field of generative modeling for computer vision, by exploring novel formulations of conditional generative models, and innovative applications in images, 3D animations, and video. Our…

Computer Vision and Pattern Recognition · Computer Science 2023-10-23 Vikram Voleti

DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

We introduce DC-VideoGen, a post-training acceleration framework for efficient video generation. DC-VideoGen can be applied to any pre-trained video diffusion model, improving efficiency by adapting it to a deep compression latent space…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Junyu Chen , Wenkun He , Yuchao Gu , Yuyang Zhao , Jincheng Yu , Junsong Chen , Dongyun Zou , Yujun Lin , Zhekai Zhang , Muyang Li , Haocheng Xi , Ligeng Zhu , Enze Xie , Song Han , Han Cai

On the use of Deep Autoencoders for Efficient Embedded Reinforcement Learning

In autonomous embedded systems, it is often vital to reduce the amount of actions taken in the real world and energy required to learn a policy. Training reinforcement learning agents from high dimensional image representations can be very…

Machine Learning · Computer Science 2019-03-26 Bharat Prakash , Mark Horton , Nicholas R. Waytowich , William David Hairston , Tim Oates , Tinoosh Mohsenin

Channel-wise Autoregressive Entropy Models for Learned Image Compression

In learning-based approaches to image compression, codecs are developed by optimizing a computational model to minimize a rate-distortion objective. Currently, the most effective learned image codecs take the form of an entropy-constrained…

Image and Video Processing · Electrical Eng. & Systems 2020-07-20 David Minnen , Saurabh Singh

Compress3D: a Compressed Latent Space for 3D Generation from a Single Image

3D generation has witnessed significant advancements, yet efficiently producing high-quality 3D assets from a single image remains challenging. In this paper, we present a triplane autoencoder, which encodes 3D models into a compact…

Computer Vision and Pattern Recognition · Computer Science 2024-03-21 Bowen Zhang , Tianyu Yang , Yu Li , Lei Zhang , Xi Zhao

Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models

Deep generative neural networks have proven effective at both conditional and unconditional modeling of complex data distributions. Conditional generation enables interactive control, but creating new controls often requires expensive…

Machine Learning · Computer Science 2017-12-25 Jesse Engel , Matthew Hoffman , Adam Roberts

Make It Efficient: Dynamic Sparse Attention for Autoregressive Image Generation

Autoregressive conditional image generation models have emerged as a dominant paradigm in text-to-image synthesis. These methods typically convert images into one-dimensional token sequences and leverage the self-attention mechanism, which…

Computer Vision and Pattern Recognition · Computer Science 2025-06-24 Xunzhi Xiang , Qi Fan

A Compact and Semantic Latent Space for Disentangled and Controllable Image Editing

Recent advances in the field of generative models and in particular generative adversarial networks (GANs) have lead to substantial progress for controlled image editing, especially compared with the pre-deep learning era. Despite their…

Computer Vision and Pattern Recognition · Computer Science 2023-12-14 Gwilherm Lesné , Yann Gousseau , Saïd Ladjal , Alasdair Newson

High-Resolution Image Synthesis with Latent Diffusion Models

By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a…

Computer Vision and Pattern Recognition · Computer Science 2022-04-14 Robin Rombach , Andreas Blattmann , Dominik Lorenz , Patrick Esser , Björn Ommer

Unpriortized Autoencoder For Image Generation

In this paper, we treat the image generation task using an autoencoder, a representative latent model. Unlike many studies regularizing the latent variable's distribution by assuming a manually specified prior, we approach the image…

Machine Learning · Computer Science 2021-08-27 Jaeyoung Yoo , Hojun Lee , Nojun Kwak

An Energy-Efficient Edge Computing Paradigm for Convolution-based Image Upsampling

A novel energy-efficient edge computing paradigm is proposed for real-time deep learning-based image upsampling applications. State-of-the-art deep learning solutions for image upsampling are currently trained using either resize or…

Computer Vision and Pattern Recognition · Computer Science 2021-07-27 Ian Colbert , Ken Kreutz-Delgado , Srinjoy Das

An Energy-Efficient Reconfigurable Autoencoder Implementation on FPGA

Autoencoders are unsupervised neural networks that are used to process and compress input data and then reconstruct the data back to the original data size. This allows autoencoders to be used for different processing applications such as…

Machine Learning · Computer Science 2023-01-18 Murat Isik , Matthew Oldland , Lifeng Zhou

Fast Training-free Perceptual Image Compression

Training-free perceptual image codec adopt pre-trained unconditional generative model during decoding to avoid training new conditional generative model. However, they heavily rely on diffusion inversion or sample communication, which take…

Image and Video Processing · Electrical Eng. & Systems 2025-06-23 Ziran Zhu , Tongda Xu , Minye Huang , Dailan He , Xingtong Ge , Xinjie Zhang , Ling Li , Yan Wang

Conditional Image Generation by Conditioning Variational Auto-Encoders

We present a conditional variational auto-encoder (VAE) which, to avoid the substantial cost of training from scratch, uses an architecture and training objective capable of leveraging a foundation model in the form of a pretrained…

Computer Vision and Pattern Recognition · Computer Science 2022-05-31 William Harvey , Saeid Naderiparizi , Frank Wood

Epsilon-VAE: Denoising as Visual Decoding

In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space. For high-dimensional visual data, it reduces redundancy and emphasizes key features for…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Long Zhao , Sanghyun Woo , Ziyu Wan , Yandong Li , Han Zhang , Boqing Gong , Hartwig Adam , Xuhui Jia , Ting Liu

Accelerating Training using Tensor Decomposition

Tensor decomposition is one of the well-known approaches to reduce the latency time and number of parameters of a pre-trained model. However, in this paper, we propose an approach to use tensor decomposition to reduce training time of…

Computer Vision and Pattern Recognition · Computer Science 2020-06-30 Mostafa Elhoushi , Ye Henry Tian , Zihao Chen , Farhan Shafiq , Joey Yiwei Li

OneGAN: Simultaneous Unsupervised Learning of Conditional Image Generation, Foreground Segmentation, and Fine-Grained Clustering

We present a method for simultaneously learning, in an unsupervised manner, (i) a conditional image generator, (ii) foreground extraction and segmentation, (iii) clustering into a two-level class hierarchy, and (iv) object removal and…

Computer Vision and Pattern Recognition · Computer Science 2020-11-03 Yaniv Benny , Lior Wolf

Towards Conceptual Compression

We introduce a simple recurrent variational auto-encoder architecture that significantly improves image modeling. The system represents the state-of-the-art in latent variable models for both the ImageNet and Omniglot datasets. We show that…

Machine Learning · Statistics 2016-05-02 Karol Gregor , Frederic Besse , Danilo Jimenez Rezende , Ivo Danihelka , Daan Wierstra