English
Related papers

Related papers: LayoutBERT: Masked Language Layout Model for Objec…

200 papers

In this paper, we introduce a new vision-language pre-trained model -- ImageBERT -- for image-text joint embedding. Our model is a Transformer-based model, which takes different modalities as input and models the relationship between them.…

Computer Vision and Pattern Recognition · Computer Science 2020-01-24 Di Qi , Lin Su , Jia Song , Edward Cui , Taroon Bharti , Arun Sacheti

Learning to insert an object instance into an image in a semantically coherent manner is a challenging and interesting problem. Solving it requires (a) determining a location to place an object in the scene and (b) determining its…

Computer Vision and Pattern Recognition · Computer Science 2018-12-10 Donghoon Lee , Sifei Liu , Jinwei Gu , Ming-Yu Liu , Ming-Hsuan Yang , Jan Kautz

We investigate the problem of automatically placing an object into a background image for image compositing. Given a background image and a segmented object, the goal is to train a model to predict plausible placements (location and scale)…

Computer Vision and Pattern Recognition · Computer Science 2023-04-10 Sijie Zhu , Zhe Lin , Scott Cohen , Jason Kuen , Zhifei Zhang , Chen Chen

This work presents Insert Anything, a unified framework for reference-based image insertion that seamlessly integrates objects from reference images into target scenes under flexible, user-specified control guidance. Instead of training…

Computer Vision and Pattern Recognition · Computer Science 2025-04-22 Wensong Song , Hong Jiang , Zongxing Yang , Ruijie Quan , Yi Yang

Attaining a high degree of user controllability in visual generation often requires intricate, fine-grained inputs like layouts. However, such inputs impose a substantial burden on users when compared to simple text inputs. To address the…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Weixi Feng , Wanrong Zhu , Tsu-jui Fu , Varun Jampani , Arjun Akula , Xuehai He , Sugato Basu , Xin Eric Wang , William Yang Wang

Compositing an object into an image involves multiple non-trivial sub-tasks such as object placement and scaling, color/lighting harmonization, viewpoint/geometry adjustment, and shadow/reflection generation. Recent generative image…

Computer Vision and Pattern Recognition · Computer Science 2024-09-12 Gemma Canet Tarrés , Zhe Lin , Zhifei Zhang , Jianming Zhang , Yizhi Song , Dan Ruta , Andrew Gilbert , John Collomosse , Soo Ye Kim

As a common image editing operation, image composition (object insertion) aims to combine the foreground from one image and another background image, to produce a composite image. However, there are many issues that could make the composite…

Computer Vision and Pattern Recognition · Computer Science 2026-03-20 Li Niu , Wenyan Cong , Liu Liu , Yan Hong , Bo Zhang , Jing Liang , Liqing Zhang

We propose Pixel-BERT to align image pixels with text by deep multi-modal transformers that jointly learn visual and language embedding in a unified end-to-end framework. We aim to build a more accurate and thorough connection between image…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhicheng Huang , Zhaoyang Zeng , Bei Liu , Dongmei Fu , Jianlong Fu

Recently, how to achieve precise image editing has attracted increasing attention, especially given the remarkable success of text-to-image generation models. To unify various spatial-aware image editing abilities into one framework, we…

Computer Vision and Pattern Recognition · Computer Science 2024-03-22 Yueru Jia , Yuhui Yuan , Aosong Cheng , Chuke Wang , Ji Li , Huizhu Jia , Shanghang Zhang

We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an…

Computer Vision and Pattern Recognition · Computer Science 2019-08-12 Liunian Harold Li , Mark Yatskar , Da Yin , Cho-Jui Hsieh , Kai-Wei Chang

Most real-world image editing tasks require multiple sequential edits to achieve desired results. Current editing approaches, primarily designed for single-object modifications, struggle with sequential editing: especially with maintaining…

Computer Vision and Pattern Recognition · Computer Science 2025-05-05 Daneul Kim , Jaeah Lee , Jaesik Park

Image editing has advanced significantly with the introduction of text-conditioned diffusion models. Despite this progress, seamlessly adding objects to images based on textual instructions without requiring user-provided input masks…

Computer Vision and Pattern Recognition · Computer Science 2025-03-21 Navve Wasserman , Noam Rotstein , Roy Ganz , Ron Kimmel

Creative processes such as painting often involve creating different components of an image one by one. Can we build a computational model to perform this task? Prior works often fail by making global changes to the image, inserting objects…

Computer Vision and Pattern Recognition · Computer Science 2024-12-25 Alper Canberk , Maksym Bondarenko , Ege Ozguroglu , Ruoshi Liu , Carl Vondrick

In this paper, we tackle the copy-paste image-to-image composition problem with a focus on object placement learning. Prior methods have leveraged generative models to reduce the reliance for dense supervision. However, this often limits…

Computer Vision and Pattern Recognition · Computer Science 2025-03-31 Hang Zhou , Xinxin Zuo , Rui Ma , Li Cheng

Joint image-text embedding is the bedrock for most Vision-and-Language (V+L) tasks, where multimodality inputs are simultaneously processed for joint visual and textual understanding. In this paper, we introduce UNITER, a UNiversal…

Computer Vision and Pattern Recognition · Computer Science 2020-07-21 Yen-Chun Chen , Linjie Li , Licheng Yu , Ahmed El Kholy , Faisal Ahmed , Zhe Gan , Yu Cheng , Jingjing Liu

Detection of objects in cluttered indoor environments is one of the key enabling functionalities for service robots. The best performing object detection approaches in computer vision exploit deep Convolutional Neural Networks (CNN) to…

Computer Vision and Pattern Recognition · Computer Science 2017-09-11 Georgios Georgakis , Arsalan Mousavian , Alexander C. Berg , Jana Kosecka

Layer compositing is one of the most popular image editing workflows among both amateurs and professionals. Motivated by the success of diffusion models, we explore layer compositing from a layered image generation perspective. Instead of…

Computer Vision and Pattern Recognition · Computer Science 2023-07-20 Xinyang Zhang , Wentian Zhao , Xin Lu , Jeff Chien

Image compositing is a key step in film making and image editing that aims to segment a foreground object and combine it with a new background. Automatic image compositing can be done easily in a studio using chroma-keying when the…

Computer Vision and Pattern Recognition · Computer Science 2019-04-12 Guanqing Hu , James J. Clark

Image composition aims to generate realistic composite image by inserting an object from one image into another background image, where the placement (e.g., location, size, occlusion) of inserted object may be unreasonable, which would…

Computer Vision and Pattern Recognition · Computer Science 2022-06-22 Liu Liu , Zhenchen Liu , Bo Zhang , Jiangtong Li , Li Niu , Qingyang Liu , Liqing Zhang

This paper introduces a tuning-free method for both object insertion and subject-driven generation. The task involves composing an object, given multiple views, into a scene specified by either an image or text. Existing methods struggle to…

Computer Vision and Pattern Recognition · Computer Science 2024-12-12 Daniel Winter , Asaf Shul , Matan Cohen , Dana Berman , Yael Pritch , Alex Rav-Acha , Yedid Hoshen
‹ Prev 1 2 3 10 Next ›