Related papers: Generation is Required for Data-Efficient Percepti…

Compositional Scene Understanding through Inverse Generative Modeling

Generative models have demonstrated remarkable abilities in generating high-fidelity visual content. In this work, we explore how generative models can further be used not only to synthesize visual content but also to understand the…

Computer Vision and Pattern Recognition · Computer Science 2025-06-25 Yanbo Wang , Justin Dauwels , Yilun Du

Does Data Scaling Lead to Visual Compositional Generalization?

Compositional understanding is crucial for human intelligence, yet it remains unclear whether contemporary vision models exhibit it. The dominant machine learning paradigm is built on the premise that scaling data and model sizes will…

Machine Learning · Computer Science 2025-07-10 Arnas Uselis , Andrea Dittadi , Seong Joon Oh

Investigating Object Compositionality in Generative Adversarial Networks

Deep generative models seek to recover the process with which the observed data was generated. They may be used to synthesize new samples or to subsequently extract representations. Successful approaches in the domain of images are driven…

Computer Vision and Pattern Recognition · Computer Science 2020-07-27 Sjoerd van Steenkiste , Karol Kurach , Jürgen Schmidhuber , Sylvain Gelly

Rethinking Generative Methods for Image Restoration in Physics-based Vision: A Theoretical Analysis from the Perspective of Information

End-to-end generative methods are considered a more promising solution for image restoration in physics-based vision compared with the traditional deconstructive methods based on handcrafted composition models. However, existing generative…

Computer Vision and Pattern Recognition · Computer Science 2022-12-09 Xudong Kang , Haoran Xie , Man-Leung Wong , Jing Qin

Provable Compositional Generalization for Object-Centric Learning

Learning representations that generalize to novel compositions of known concepts is crucial for bridging the gap between human and machine perception. One prominent effort is learning object-centric representations, which are widely…

Machine Learning · Computer Science 2024-11-13 Thaddäus Wiedemer , Jack Brady , Alexander Panfilov , Attila Juhos , Matthias Bethge , Wieland Brendel

Generative Compression

Traditional image and video compression algorithms rely on hand-crafted encoder/decoder pairs (codecs) that lack adaptability and are agnostic to the data being compressed. Here we describe the concept of generative compression, the…

Computer Vision and Pattern Recognition · Computer Science 2017-06-06 Shibani Santurkar , David Budden , Nir Shavit

Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis

Image generation today can produce somewhat realistic images from text prompts. However, if one asks the generator to synthesize a specific camera setting such as creating different fields of view using a 24mm lens versus a 70mm lens, the…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Yu Yuan , Xijun Wang , Yichen Sheng , Prateek Chennuri , Xingguang Zhang , Stanley Chan

UniHetero: Could Generation Enhance Understanding for Vision-Language-Model at Large Data Scale?

Vision-language large models are moving toward the unification of visual understanding and visual generation tasks. However, whether generation can enhance understanding is still under-explored on large data scale. In this work, we analysis…

Computation and Language · Computer Science 2026-01-01 Fengjiao Chen , Minhao Jing , Weitao Lu , Yan Feng , Xiaoyu Li , Xuezhi Cao

Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models

Compositional generalization, the ability to recognize familiar parts in novel contexts, is a defining property of intelligent systems. Although modern models are trained on massive datasets, they still cover only a tiny fraction of the…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Arnas Uselis , Andrea Dittadi , Seong Joon Oh

The Informed Sampler: A Discriminative Approach to Bayesian Inference in Generative Computer Vision Models

Computer vision is hard because of a large variability in lighting, shape, and texture; in addition the image signal is non-additive due to occlusion. Generative models promised to account for this variability by accurately modelling the…

Computer Vision and Pattern Recognition · Computer Science 2015-03-10 Varun Jampani , Sebastian Nowozin , Matthew Loper , Peter V. Gehler

Generative Interventions for Causal Learning

We introduce a framework for learning robust visual representations that generalize to new viewpoints, backgrounds, and scene contexts. Discriminative models often learn naturally occurring spurious correlations, which cause them to fail on…

Computer Vision and Pattern Recognition · Computer Science 2021-03-30 Chengzhi Mao , Augustine Cha , Amogh Gupta , Hao Wang , Junfeng Yang , Carl Vondrick

Vector-based Representation is the Key: A Study on Disentanglement and Compositional Generalization

Recognizing elementary underlying concepts from observations (disentanglement) and generating novel combinations of these concepts (compositional generalization) are fundamental abilities for humans to support rapid knowledge learning and…

Computer Vision and Pattern Recognition · Computer Science 2023-05-30 Tao Yang , Yuwang Wang , Cuiling Lan , Yan Lu , Nanning Zheng

Towards Understanding the Relationship between In-context Learning and Compositional Generalization

According to the principle of compositional generalization, the meaning of a complex expression can be understood as a function of the meaning of its parts and of how they are combined. This principle is crucial for human language…

Computation and Language · Computer Science 2024-03-19 Sungjun Han , Sebastian Padó

Learning by Analogy: A Causal Framework for Composition Generalization

Compositional generalization -- the ability to understand and generate novel combinations of learned concepts -- enables models to extend their capabilities beyond limited experiences. While effective, the data structures and principles…

Machine Learning · Computer Science 2025-12-12 Lingjing Kong , Shaoan Xie , Yang Jiao , Yetian Chen , Yanhui Guo , Simone Shao , Yan Gao , Guangyi Chen , Kun Zhang

Learning Inference Models for Computer Vision

Computer vision can be understood as the ability to perform inference on image data. Breakthroughs in computer vision technology are often marked by advances in inference techniques. This thesis proposes novel inference schemes and…

Computer Vision and Pattern Recognition · Computer Science 2017-09-04 Varun Jampani

GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers

The synergy between generative and discriminative models receives growing attention. While discriminative Contrastive Language-Image Pre-Training (CLIP) excels in high-level semantics, it struggles with perceiving fine-grained visual…

Computer Vision and Pattern Recognition · Computer Science 2025-08-01 Shijie Ma , Yuying Ge , Teng Wang , Yuxin Guo , Yixiao Ge , Ying Shan

Compositional diversity in visual concept learning

Humans leverage compositionality to efficiently learn new concepts, understanding how familiar parts can combine together to form novel objects. In contrast, popular computer vision models struggle to make the same types of inferences,…

Computer Vision and Pattern Recognition · Computer Science 2023-06-01 Yanli Zhou , Reuben Feinman , Brenden M. Lake

Image Generation and Translation with Disentangled Representations

Generative models have made significant progress in the tasks of modeling complex data distributions such as natural images. The introduction of Generative Adversarial Networks (GANs) and auto-encoders lead to the possibility of training on…

Computer Vision and Pattern Recognition · Computer Science 2018-03-29 Tobias Hinz , Stefan Wermter

Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception

With the success of image generation, generative diffusion models are increasingly adopted for discriminative tasks, as pixel generation provides a unified perception interface. However, directly repurposing the generative denoising process…

Computer Vision and Pattern Recognition · Computer Science 2025-04-16 Ziqi Pang , Xin Xu , Yu-Xiong Wang

Iterative Scene Graph Generation with Generative Transformers

Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format. This representation has proven useful in several tasks, such as question answering,…

Computer Vision and Pattern Recognition · Computer Science 2022-12-01 Sanjoy Kundu , Sathyanarayanan N. Aakur