Related papers: GIMP-ML: Python Plugins for using Computer Vision …

MMV_Im2Im: An Open Source Microscopy Machine Vision Toolbox for Image-to-Image Transformation

Over the past decade, deep learning (DL) research in computer vision has been growing rapidly, with many advances in DL-based image analysis methods for biomedical problems. In this work, we introduce MMV_Im2Im, a new open-source python…

Image and Video Processing · Electrical Eng. & Systems 2023-03-20 Justin Sonneck , Jianxu Chen

Variations of images to increase their visibility

The calculus of variations applied to the image processing requires some numerical models able to perform the variations of images and the extremization of appropriate actions. To produce the variations of images, there are several…

Computer Vision and Pattern Recognition · Computer Science 2012-01-18 Amelia Carolina Sparavigna

MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills

Retouching is an essential task in post-manipulation of raw photographs. Generative editing, guided by text or strokes, provides a new tool accessible to users but can easily change the identity of the original objects in unacceptable and…

Graphics · Computer Science 2025-05-12 Niladri Shekhar Dutt , Duygu Ceylan , Niloy J. Mitra

DeepMIM: Deep Supervision for Masked Image Modeling

Deep supervision, which involves extra supervisions to the intermediate features of a neural network, was widely used in image classification in the early deep learning era since it significantly reduces the training difficulty and eases…

Computer Vision and Pattern Recognition · Computer Science 2023-03-17 Sucheng Ren , Fangyun Wei , Samuel Albanie , Zheng Zhang , Han Hu

VITRIX-CLIPIN: Enhancing Fine-Grained Visual Understanding in CLIP via Instruction Editing Data and Long Captions

Despite the success of Vision-Language Models (VLMs) like CLIP in aligning vision and language, their proficiency in detailed, fine-grained visual comprehension remains a key challenge. We present CLIP-IN, a novel framework that bolsters…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Ziteng Wang , Siqi Yang , Limeng Qiao , Lin Ma

GIMP and Wavelets for Medical Image Processing: Enhancing Images of the Fundus of the Eye

The visual analysis of retina and of its vascular characteristics is important in the diagnosis and monitoring of diseases of visual perception. In the related medical diagnoses, the digital processing of the fundus images is used to obtain…

Computer Vision and Pattern Recognition · Computer Science 2015-08-06 Amelia Carolina Sparavigna

Physics-Informed Neural Networks For Semiconductor Film Deposition: A Review

Semiconductor manufacturing relies heavily on film deposition processes, such as Chemical Vapor Deposition and Physical Vapor Deposition. These complex processes require precise control to achieve film uniformity, proper adhesion, and…

Machine Learning · Computer Science 2025-07-16 Tao Han , Zahra Taheri , Hyunwoong Ko

MIND-Edit: MLLM Insight-Driven Editing via Language-Vision Projection

Recent advances in AI-generated content (AIGC) have significantly accelerated image editing techniques, driving increasing demand for diverse and fine-grained edits. Despite these advances, existing image editing methods still face…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Shuyu Wang , Weiqi Li , Qian Wang , Shijie Zhao , Jian Zhang

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Yirui Chen , Xudong Huang , Quan Zhang , Wei Li , Mingjian Zhu , Qiangyu Yan , Simiao Li , Hanting Chen , Hailin Hu , Jie Yang , Wei Liu , Jie Hu

PIP-MM: Pre-Integrating Prompt Information into Visual Encoding via Existing MLLM Structures

The Multimodal Large Language Models (MLLMs) have activated the capabilitiesof Large Language Models (LLMs) in solving visual-language tasks by integratingvisual information. The prevailing approach in existing MLLMs involvesemploying an…

Computer Vision and Pattern Recognition · Computer Science 2024-10-31 Tianxiang Wu , Minxin Nie , Ziqiang Cao

Raising the Bar of AI-generated Image Detection with CLIP

The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a…

Computer Vision and Pattern Recognition · Computer Science 2024-04-30 Davide Cozzolino , Giovanni Poggi , Riccardo Corvi , Matthias Nießner , Luisa Verdoliva

mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs

Modular vision-language models (Vision-LLMs) align pretrained image encoders with (frozen) large language models (LLMs) and post-hoc condition LLMs to `understand' the image input. With the abundance of readily available high-quality…

Computer Vision and Pattern Recognition · Computer Science 2024-06-21 Gregor Geigle , Abhay Jain , Radu Timofte , Goran Glavaš

PiML Toolbox for Interpretable Machine Learning Model Development and Diagnostics

PiML (read $\pi$-ML, /`pai`em`el/) is an integrated and open-access Python toolbox for interpretable machine learning model development and model diagnostics. It is designed with machine learning workflows in both low-code and high-code…

Machine Learning · Computer Science 2023-12-21 Agus Sudjianto , Aijun Zhang , Zebin Yang , Yu Su , Ningzhou Zeng

CLIPin: A Non-contrastive Plug-in to CLIP for Multimodal Semantic Alignment

Large-scale natural image-text datasets, especially those automatically collected from the web, often suffer from loose semantic alignment due to weak supervision, while medical datasets tend to have high cross-modal correlation but low…

Computer Vision and Pattern Recognition · Computer Science 2025-09-26 Shengzhu Yang , Jiawei Du , Shuai Lu , Weihang Zhang , Ningli Wang , Huiqi Li

On the Limitations of Vision-Language Models in Understanding Image Transforms

Vision Language Models (VLMs) have demonstrated significant potential in various downstream tasks, including Image/Video Generation, Visual Question Answering, Multimodal Chatbots, and Video Understanding. However, these models often…

Computer Vision and Pattern Recognition · Computer Science 2025-03-17 Ahmad Mustafa Anis , Hasnain Ali , Saquib Sarfraz

CLIP Guided Image-perceptive Prompt Learning for Image Enhancement

Image enhancement is a significant research area in the fields of computer vision and image processing. In recent years, many learning-based methods for image enhancement have been developed, where the Look-up-table (LUT) has proven to be…

Computer Vision and Pattern Recognition · Computer Science 2023-11-23 Weiwen Chen , Qiuhong Ke , Zinuo Li

LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models

Vision-language pre-training like CLIP has shown promising performance on various downstream tasks such as zero-shot image classification and image-text retrieval. Most of the existing CLIP-alike works usually adopt relatively large image…

Computer Vision and Pattern Recognition · Computer Science 2023-12-04 Ying Nie , Wei He , Kai Han , Yehui Tang , Tianyu Guo , Fanyi Du , Yunhe Wang

Concept-Guided Prompt Learning for Generalization in Vision-Language Models

Contrastive Language-Image Pretraining (CLIP) model has exhibited remarkable efficacy in establishing cross-modal connections between texts and images, yielding impressive performance across a broad spectrum of downstream applications…

Computer Vision and Pattern Recognition · Computer Science 2024-01-17 Yi Zhang , Ce Zhang , Ke Yu , Yushun Tang , Zhihai He

PinCLIP: Large-scale Foundational Multimodal Representation at Pinterest

While multi-modal Visual Language Models (VLMs) have demonstrated significant success across various domains, the integration of VLMs into recommendation and retrieval systems remains a challenge, due to issues like training objective…

Computer Vision and Pattern Recognition · Computer Science 2026-03-05 Josh Beal , Eric Kim , Jinfeng Rao , Rex Wu , Dmitry Kislyuk , Charles Rosenberg

Unsupervised Image Fusion Using Deep Image Priors

A significant number of researchers have applied deep learning methods to image fusion. However, most works require a large amount of training data or depend on pre-trained models or frameworks to capture features from source images. This…

Computer Vision and Pattern Recognition · Computer Science 2022-02-23 Xudong Ma , Paul Hill , Nantheera Anantrasirichai , Alin Achim