Related papers: Multiresolution Textual Inversion

Textual Inversion and Self-supervised Refinement for Radiology Report Generation

Existing mainstream approaches follow the encoder-decoder paradigm for generating radiology reports. They focus on improving the network structure of encoders and decoders, which leads to two shortcomings: overlooking the modality gap and…

Computer Vision and Pattern Recognition · Computer Science 2024-06-07 Yuanjiang Luo , Hongxiang Li , Xuan Wu , Meng Cao , Xiaoshuang Huang , Zhihong Zhu , Peixi Liao , Hu Chen , Yi Zhang

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in…

Computer Vision and Pattern Recognition · Computer Science 2022-08-03 Rinon Gal , Yuval Alaluf , Yuval Atzmon , Or Patashnik , Amit H. Bermano , Gal Chechik , Daniel Cohen-Or

An Image is Worth Multiple Words: Discovering Object Level Concepts using Multi-Concept Prompt Learning

Textural Inversion, a prompt learning method, learns a singular text embedding for a new "word" to represent image style and appearance, allowing it to be integrated into natural language sentences to generate novel synthesised images.…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Chen Jin , Ryutaro Tanno , Amrutha Saseendran , Tom Diethe , Philip Teare

Learned Multi-View Texture Super-Resolution

We present a super-resolution method capable of creating a high-resolution texture map for a virtual 3D object from a set of lower-resolution images of that object. Our architecture unifies the concepts of (i) multi-view super-resolution…

Computer Vision and Pattern Recognition · Computer Science 2020-01-15 Audrey Richard , Ian Cherabier , Martin R. Oswald , Vagia Tsiminaki , Marc Pollefeys , Konrad Schindler

Rethinking Super-Resolution as Text-Guided Details Generation

Deep neural networks have greatly promoted the performance of single image super-resolution (SISR). Conventional methods still resort to restoring the single high-resolution (HR) solution only based on the input of image modality. However,…

Computer Vision and Pattern Recognition · Computer Science 2022-07-15 Chenxi Ma , Bo Yan , Qing Lin , Weimin Tan , Siming Chen

Multimodal Image Super-resolution via Joint Sparse Representations induced by Coupled Dictionaries

Real-world data processing problems often involve various image modalities associated with a certain scene, including RGB images, infrared images or multi-spectral images. The fact that different image modalities often share certain…

Computer Vision and Pattern Recognition · Computer Science 2021-03-11 Pingfan Song , Xin Deng , João F. C. Mota , Nikos Deligiannis , Pier Luigi Dragotti , Miguel R. D. Rodrigues

Comparison Reveals Commonality: Customized Image Generation through Contrastive Inversion

The recent demand for customized image generation raises a need for techniques that effectively extract the common concept from small sets of images. Existing methods typically rely on additional guidance, such as text prompts or spatial…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Minseo Kim , Minchan Kwon , Dongyeun Lee , Yunho Jeon , Junmo Kim

EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models

Text-to-image generation models~(e.g., Stable Diffusion) have achieved significant advancements, enabling the creation of high-quality and realistic images based on textual descriptions. Prompt inversion, the task of identifying the textual…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Mingzhe Li , Kejing Xia , Gehao Zhang , Zhenting Wang , Guanhong Tao , Siqi Pan , Juan Zhai , Shiqing Ma

MMC: Multi-Modal Colorization of Images using Textual Descriptions

Handling various objects with different colors is a significant challenge for image colorization techniques. Thus, for complex real-world scenes, the existing image colorization algorithms often fail to maintain color consistency. In this…

Computer Vision and Pattern Recognition · Computer Science 2023-04-26 Subhankar Ghosh , Saumik Bhattacharya , Prasun Roy , Umapada Pal , Michael Blumenstein

Towards End-to-End In-Image Neural Machine Translation

In this paper, we offer a preliminary investigation into the task of in-image machine translation: transforming an image containing text in one language into an image containing the same text in another language. We propose an end-to-end…

Computation and Language · Computer Science 2020-10-22 Elman Mansimov , Mitchell Stern , Mia Chen , Orhan Firat , Jakob Uszkoreit , Puneet Jain

Interpretable Deep Multimodal Image Super-Resolution

Multimodal image super-resolution (SR) is the reconstruction of a high resolution image given a low-resolution observation with the aid of another image modality. While existing deep multimodal models do not incorporate domain knowledge…

Computer Vision and Pattern Recognition · Computer Science 2020-09-08 Iman Marivani , Evaggelia Tsiligianni , Bruno Cornelis , Nikos Deligiannis

Retrieval Guided Unsupervised Multi-domain Image-to-Image Translation

Image to image translation aims to learn a mapping that transforms an image from one visual domain to another. Recent works assume that images descriptors can be disentangled into a domain-invariant content representation and a…

Computer Vision and Pattern Recognition · Computer Science 2020-08-13 Raul Gomez , Yahui Liu , Marco De Nadai , Dimosthenis Karatzas , Bruno Lepri , Nicu Sebe

Controllable 3D Object Generation with Single Image Prompt

Recently, the impressive generative capabilities of diffusion models have been demonstrated, producing images with remarkable fidelity. Particularly, existing methods for the 3D object generation tasks, which is one of the fastest-growing…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Jaeseok Lee , Jaekoo Lee

Multi-View Image-to-Image Translation Supervised by 3D Pose

We address the task of multi-view image-to-image translation for person image generation. The goal is to synthesize photo-realistic multi-view images with pose-consistency across all views. Our proposed end-to-end framework is based on a…

Computer Vision and Pattern Recognition · Computer Science 2021-04-14 Idit Diamant , Oranit Dror , Hai Victor Habi , Arnon Netzer

Single-Reference Text-to-Image Manipulation with Dual Contrastive Denoising Score

Large-scale text-to-image generative models have shown remarkable ability to synthesize diverse and high-quality images. However, it is still challenging to directly apply these models for editing real images for two reasons. First, it is…

Computer Vision and Pattern Recognition · Computer Science 2025-08-19 Syed Muhmmad Israr , Feng Zhao

Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval

Text-to-image generation has shown remarkable progress with the emergence of diffusion models. However, these models often generate factually inconsistent images, failing to accurately reflect the factual information and common sense…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Youngsun Lim , Hyunjung Shim

TEXTS-Diff: TEXTS-Aware Diffusion Model for Real-World Text Image Super-Resolution

Real-world text image super-resolution aims to restore overall visual quality and text legibility in images suffering from diverse degradations and text distortions. However, the scarcity of text image data in existing datasets results in…

Computer Vision and Pattern Recognition · Computer Science 2026-01-27 Haodong He , Xin Zhan , Yancheng Bai , Rui Lan , Lei Sun , Xiangxiang Chu

MTFusion: Reconstructing Any 3D Object from Single Image Using Multi-word Textual Inversion

Reconstructing 3D models from single-view images is a long-standing problem in computer vision. The latest advances for single-image 3D reconstruction extract a textual description from the input image and further utilize it to synthesize…

Computer Vision and Pattern Recognition · Computer Science 2024-11-20 Yu Liu , Ruowei Wang , Jiaqi Li , Zixiang Xu , Qijun Zhao

Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models

Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate. In this paper, we consider the inverse problem -- given a collection of…

Computer Vision and Pattern Recognition · Computer Science 2023-08-04 Nan Liu , Yilun Du , Shuang Li , Joshua B. Tenenbaum , Antonio Torralba

Super-resolution Using Constrained Deep Texture Synthesis

Hallucinating high frequency image details in single image super-resolution is a challenging task. Traditional super-resolution methods tend to produce oversmoothed output images due to the ambiguity in mapping between low and high…

Computer Vision and Pattern Recognition · Computer Science 2017-01-27 Libin Sun , James Hays