Related papers: InteractDiffusion: Interaction Control in Text-to-…

Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models

Prevalent human-object interaction (HOI) detection approaches typically leverage large-scale visual-linguistic models to help recognize events involving humans and objects. Though promising, models trained via contrastive learning on…

Computer Vision and Pattern Recognition · Computer Science 2024-10-29 Liulei Li , Wenguan Wang , Yi Yang

HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models

We address the problem of generating realistic 3D human-object interactions (HOIs) driven by textual prompts. To this end, we take a modular design and decompose the complex task into simpler sub-tasks. We first develop a dual-branch…

Computer Vision and Pattern Recognition · Computer Science 2025-07-08 Xiaogang Peng , Yiming Xie , Zizhao Wu , Varun Jampani , Deqing Sun , Huaizu Jiang

Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model

This paper investigates the problem of the current HOI detection methods and introduces DiffHOI, a novel HOI detection scheme grounded on a pre-trained text-image diffusion model, which enhances the detector's performance via improved data…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Jie Yang , Bingliang Li , Fengyu Yang , Ailing Zeng , Lei Zhang , Ruimao Zhang

Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation

Recently, large-scale text-to-image (T2I) diffusion models have emerged as a powerful tool for image-to-image translation (I2I), allowing open-domain image translation via user-provided text prompts. This paper proposes frequency-controlled…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Xiang Gao , Zhengbo Xu , Junhan Zhao , Jiaying Liu

Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation

Text-to-image (T2I) generative diffusion models have demonstrated outstanding performance in synthesizing diverse, high-quality visuals from text captions. Several layout-to-image models have been developed to control the generation process…

Computer Vision and Pattern Recognition · Computer Science 2025-02-11 Ahmad Süleyman , Göksel Biricik

An Image-like Diffusion Method for Human-Object Interaction Detection

Human-object interaction (HOI) detection often faces high levels of ambiguity and indeterminacy, as the same interaction can appear vastly different across different human-object pairs. Additionally, the indeterminacy can be further…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Xiaofei Hui , Haoxuan Qu , Hossein Rahmani , Jun Liu

InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion

This paper addresses a novel task of anticipating 3D human-object interactions (HOIs). Most existing research on HOI synthesis lacks comprehensive whole-body interactions with dynamic objects, e.g., often limited to manipulating small or…

Computer Vision and Pattern Recognition · Computer Science 2023-09-01 Sirui Xu , Zhengyuan Li , Yu-Xiong Wang , Liang-Yan Gui

Controllable Generation with Text-to-Image Diffusion Models: A Survey

In the rapidly advancing realm of visual generation, diffusion models have revolutionized the landscape, marking a significant shift in capabilities with their impressive text-guided generative functions. However, relying solely on text for…

Computer Vision and Pattern Recognition · Computer Science 2026-01-09 Pu Cao , Feng Zhou , Qing Song , Lu Yang

FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation

Large-scale text-to-image diffusion models have been a revolutionary milestone in the evolution of generative AI and multimodal technology, allowing wonderful image generation with natural-language text prompt. However, the issue of lacking…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Xiang Gao , Jiaying Liu

HOIDiNi: Human-Object Interaction through Diffusion Noise Optimization

We present HOIDiNi, a text-driven diffusion framework for synthesizing realistic and plausible human-object interaction (HOI). HOI generation is extremely challenging since it induces strict contact accuracies alongside a diverse motion…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Roey Ron , Guy Tevet , Haim Sawdayee , Amit H. Bermano

A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models

Image editing aims to edit the given synthetic or real image to meet the specific requirements from users. It is widely studied in recent years as a promising and challenging field of Artificial Intelligence Generative Content (AIGC).…

Computer Vision and Pattern Recognition · Computer Science 2024-06-21 Xincheng Shuai , Henghui Ding , Xingjun Ma , Rongcheng Tu , Yu-Gang Jiang , Dacheng Tao

THOR: Text to Human-Object Interaction Diffusion via Relation Intervention

This paper addresses new methodologies to deal with the challenging task of generating dynamic Human-Object Interactions from textual descriptions (Text2HOI). While most existing works assume interactions with limited body parts or static…

Computer Vision and Pattern Recognition · Computer Science 2024-03-19 Qianyang Wu , Ye Shi , Xiaoshui Huang , Jingyi Yu , Lan Xu , Jingya Wang

Auto-Regressive Diffusion for Generating 3D Human-Object Interactions

Text-driven Human-Object Interaction (Text-to-HOI) generation is an emerging field with applications in animation, video games, virtual reality, and robotics. A key challenge in HOI generation is maintaining interaction consistency in long…

Graphics · Computer Science 2025-03-24 Zichen Geng , Zeeshan Hayder , Wei Liu , Ajmal Saeed Mian

Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation

We propose a diffusion-based approach for Text-to-Image (T2I) generation with interactive 3D layout control. Layout control has been widely studied to alleviate the shortcomings of T2I diffusion models in understanding objects' placement…

Computer Vision and Pattern Recognition · Computer Science 2024-08-28 Abdelrahman Eldesokey , Peter Wonka

Implicit Bias Injection Attacks against Text-to-Image Diffusion Models

The proliferation of text-to-image diffusion models (T2I DMs) has led to an increased presence of AI-generated images in daily life. However, biased T2I models can generate content with specific tendencies, potentially influencing people's…

Computer Vision and Pattern Recognition · Computer Science 2025-04-03 Huayang Huang , Xiangye Jin , Jiaxu Miao , Yu Wu

HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

3D hand-object interaction data is scarce due to the hardware constraints in scaling up the data collection process. In this paper, we propose HOIDiffusion for generating realistic and diverse 3D hand-object interaction data. Our model is a…

Computer Vision and Pattern Recognition · Computer Science 2024-03-19 Mengqi Zhang , Yang Fu , Zheng Ding , Sifei Liu , Zhuowen Tu , Xiaolong Wang

TriDi: Trilateral Diffusion of 3D Humans, Objects, and Interactions

Modeling 3D human-object interaction (HOI) is a problem of great interest for computer vision and a key enabler for virtual and mixed-reality applications. Existing methods work in a one-way direction: some recover plausible human…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Ilya A. Petrov , Riccardo Marin , Julian Chibane , Gerard Pons-Moll

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

Large-scale diffusion models have achieved state-of-the-art results on text-to-image synthesis (T2I) tasks. Despite their ability to generate high-quality yet creative images, we observe that attribution-binding and compositional…

Computer Vision and Pattern Recognition · Computer Science 2023-03-02 Weixi Feng , Xuehai He , Tsu-Jui Fu , Varun Jampani , Arjun Akula , Pradyumna Narayana , Sugato Basu , Xin Eric Wang , William Yang Wang

ReCorD: Reasoning and Correcting Diffusion for HOI Generation

Diffusion models revolutionize image generation by leveraging natural language to guide the creation of multimedia content. Despite significant advancements in such generative models, challenges persist in depicting detailed human-object…

Multimedia · Computer Science 2024-07-26 Jian-Yu Jiang-Lin , Kang-Yang Huang , Ling Lo , Yi-Ning Huang , Terence Lin , Jhih-Ciang Wu , Hong-Han Shuai , Wen-Huang Cheng

POCI-Diff: Position Objects Consistently and Interactively with 3D-Layout Guided Diffusion

We propose a diffusion-based approach for Text-to-Image (T2I) generation with consistent and interactive 3D layout control and editing. While prior methods improve spatial adherence using 2D cues or iterative copy-warp-paste strategies,…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Andrea Rigo , Luca Stornaiuolo , Weijie Wang , Mauro Martino , Bruno Lepri , Nicu Sebe