Related papers: Multi-Modality Deep Network for Extreme Learned Im…

Perceptual Image Compression with Cooperative Cross-Modal Side Information

The explosion of data has resulted in more and more associated text being transmitted along with images. Inspired by from distributed source coding, many works utilize image side information to enhance image compression. However, existing…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Shiyu Qin , Bin Chen , Yujun Huang , Baoyi An , Tao Dai , Shu-Tao Xia

Multi-Modality Deep Network for JPEG Artifacts Reduction

In recent years, many convolutional neural network-based models are designed for JPEG artifacts reduction, and have achieved notable progress. However, few methods are suitable for extreme low-bitrate image compression artifacts reduction.…

Computer Vision and Pattern Recognition · Computer Science 2023-05-05 Xuhao Jiang , Weimin Tan , Qing Lin , Chenxi Ma , Bo Yan , Liquan Shen

Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

Cross-modal retrieval between visual data and natural language description remains a long-standing challenge in multimedia. While recent image-text retrieval methods offer great promise by learning deep representations aligned across…

Multimedia · Computer Science 2018-08-24 Niluthpol Chowdhury Mithun , Rameswar Panda , Evangelos E. Papalexakis , Amit K. Roy-Chowdhury

Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity

Recent advances in text-guided image compression have shown great potential to enhance the perceptual quality of reconstructed images. These methods, however, tend to have significantly degraded pixel-wise fidelity, limiting their…

Computer Vision and Pattern Recognition · Computer Science 2024-05-24 Hagyeong Lee , Minkyu Kim , Jun-Hyuk Kim , Seungeon Kim , Dokwan Oh , Jaeho Lee

Recognition-Aware Learned Image Compression

Learned image compression methods generally optimize a rate-distortion loss, trading off improvements in visual distortion for added bitrate. Increasingly, however, compressed imagery is used as an input to deep learning networks for…

Image and Video Processing · Electrical Eng. & Systems 2022-02-02 Maxime Kawawa-Beaudan , Ryan Roggenkemper , Avideh Zakhor

Extreme Generative Image Compression by Learning Text Embedding from Diffusion Models

Transferring large amount of high resolution images over limited bandwidth is an important but very challenging task. Compressing images using extremely low bitrates (<0.1 bpp) has been studied but it often results in low quality images of…

Image and Video Processing · Electrical Eng. & Systems 2022-11-16 Zhihong Pan , Xin Zhou , Hao Tian

Image and Encoded Text Fusion for Multi-Modal Classification

Multi-modal approaches employ data from multiple input streams such as textual and visual domains. Deep neural networks have been successfully employed for these approaches. In this paper, we present a novel multi-modal approach that fuses…

Computer Vision and Pattern Recognition · Computer Science 2018-10-05 Ignazio Gallo , Alessandro Calefati , Shah Nawaz , Muhammad Kamran Janjua

Fidelity-preserving Learning-Based Image Compression: Loss Function and Subjective Evaluation Methodology

Learning-based image compression methods have emerged as state-of-the-art, showcasing higher performance compared to conventional compression solutions. These data-driven approaches aim to learn the parameters of a neural network model…

Multimedia · Computer Science 2024-03-20 Shima Mohammadi , Yaojun Wu , João Ascenso

LMM-driven Semantic Image-Text Coding for Ultra Low-bitrate Learned Image Compression

Supported by powerful generative models, low-bitrate learned image compression (LIC) models utilizing perceptual metrics have become feasible. Some of the most advanced models achieve high compression rates and superior perceptual quality…

Image and Video Processing · Electrical Eng. & Systems 2024-11-21 Shimon Murai , Heming Sun , Jiro Katto

MMC: Multi-Modal Colorization of Images using Textual Descriptions

Handling various objects with different colors is a significant challenge for image colorization techniques. Thus, for complex real-world scenes, the existing image colorization algorithms often fail to maintain color consistency. In this…

Computer Vision and Pattern Recognition · Computer Science 2023-04-26 Subhankar Ghosh , Saumik Bhattacharya , Prasun Roy , Umapada Pal , Michael Blumenstein

Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training

Image-text retrieval is a central problem for understanding the semantic relationship between vision and language, and serves as the basis for various visual and language tasks. Most previous works either simply learn coarse-grained…

Computer Vision and Pattern Recognition · Computer Science 2023-07-19 Chong Liu , Yuqi Zhang , Hongsong Wang , Weihua Chen , Fan Wang , Yan Huang , Yi-Dong Shen , Liang Wang

Multi-modal Visual Understanding with Prompts for Semantic Information Disentanglement of Image

Multi-modal visual understanding of images with prompts involves using various visual and textual cues to enhance the semantic understanding of images. This approach combines both vision and language processing to generate more accurate…

Computer Vision and Pattern Recognition · Computer Science 2023-05-17 Yuzhou Peng

Semantics-Guided Generative Image Compression

Advancements in text-to-image generative AI with large multimodal models are spreading into the field of image compression, creating high-quality representation of images at extremely low bit rates. This work introduces novel components to…

Image and Video Processing · Electrical Eng. & Systems 2025-06-02 Cheng-Lin Wu , Hyomin Choi , Ivan V. Bajić

Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval

Fine-grained text-to-image retrieval aims to retrieve a fine-grained target image with a given text query. Existing methods typically assume that each training image is accurately depicted by its textual descriptions. However, textual…

Computer Vision and Pattern Recognition · Computer Science 2025-04-11 Zehong Ma , Hao Chen , Wei Zeng , Limin Su , Shiliang Zhang

Self-Supervised Learning from Web Data for Multimodal Retrieval

Self-Supervised learning from multimodal image and text data allows deep neural networks to learn powerful features with no need of human annotated data. Web and Social Media platforms provide a virtually unlimited amount of this multimodal…

Computer Vision and Pattern Recognition · Computer Science 2019-01-09 Raul Gomez , Lluis Gomez , Jaume Gibert , Dimosthenis Karatzas

Preprocessing Enhanced Image Compression for Machine Vision

Recently, more and more images are compressed and sent to the back-end devices for the machine analysis tasks~(\textit{e.g.,} object detection) instead of being purely watched by humans. However, most traditional or learned image codecs are…

Image and Video Processing · Electrical Eng. & Systems 2022-06-14 Guo Lu , Xingtong Ge , Tianxiong Zhong , Jing Geng , Qiang Hu

Efficient Learned Image Compression Through Knowledge Distillation

Learned image compression sits at the intersection of machine learning and image processing. With advances in deep learning, neural network-based compression methods have emerged. In this process, an encoder maps the image to a…

Computer Vision and Pattern Recognition · Computer Science 2025-09-15 Fabien Allemand , Attilio Fiandrotti , Sumanta Chaudhuri , Alaa Eddine Mazouz

Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information

With the advancement of large-scale language modeling techniques, large multimodal models combining visual encoders with large language models have demonstrated exceptional performance in various visual tasks. Most of the current…

Computer Vision and Pattern Recognition · Computer Science 2024-12-20 Yi Chen , Jian Xu , Xu-Yao Zhang , Wen-Zhuo Liu , Yang-Yang Liu , Cheng-Lin Liu

Learning with Multi-modal Gradient Attention for Explainable Composed Image Retrieval

We consider the problem of composed image retrieval that takes an input query consisting of an image and a modification text indicating the desired changes to be made on the image and retrieves images that match these changes. Current…

Computer Vision and Pattern Recognition · Computer Science 2023-09-01 Prateksha Udhayanan , Srikrishna Karanam , Balaji Vasan Srinivasan

Efficient Masked Image Compression with Position-Indexed Self-Attention

In recent years, image compression for high-level vision tasks has attracted considerable attention from researchers. Given that object information in images plays a far more crucial role in downstream tasks than background information,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-18 Chengjie Dai , Tiantian Song , Hui Tang , Fangdong Chen , Bowei Yang , Guanghua Song