Related papers: Learning Visual Context by Comparison
We propose a new approach to determine correspondences between image pairs in the wild under large changes in illumination, viewpoint, context, and material. While other approaches find correspondences between pairs of images by treating…
Given a user's query, traditional image search systems rank images according to its relevance to a single modality (e.g., image content or surrounding text). Nowadays, an increasing number of images on the Internet are available with…
For object detection, how to address the contradictory requirement between feature map resolution and receptive field on high-resolution inputs still remains an open question. In this paper, to tackle this issue, we build a novel…
Referring image segmentation aims to segment the target object described by a given natural language expression. Typically, referring expressions contain complex relationships between the target and its surrounding objects. The main…
Image segmentation is the problem of partitioning an image into different subsets, where each subset may have a different characterization in terms of color, intensity, texture, and/or other features. Segmentation is a fundamental component…
Differential medical VQA models compare multiple images to identify clinically meaningful changes and rely on vision encoders to capture fine-grained visual differences that reflect radiologists' comparative diagnostic workflows. However,…
Chest X-rays are one of the most commonly used technologies for medical diagnosis. Many deep learning models have been proposed to improve and automate the abnormality detection task on this type of data. In this paper, we propose a…
Most of computer vision focuses on what is in an image. We propose to train a standalone object-centric context representation to perform the opposite task: seeing what is not there. Given an image, our context model can predict where…
Human-object interaction detection is an important and relatively new class of visual relationship detection tasks, essential for deeper scene understanding. Most existing approaches decompose the problem into object localization and…
Vision Transformers $(\texttt{ViT})$ have become the architecture of choice for many computer vision tasks, yet their performance in computer-aided diagnostics remains limited. Focusing on breast cancer detection from mammograms, we…
Medical image processing tasks such as segmentation often require capturing non-local information. As organs, bones, and tissues share common characteristics such as intensity, shape, and texture, the contextual information plays a critical…
Visual Commonsense Reasoning (VCR) remains a significant yet challenging research problem in the realm of visual reasoning. A VCR model generally aims at answering a textual question regarding an image, followed by the rationale prediction…
Medical image segmentation demands the aggregation of global and local feature representations, posing a challenge for current methodologies in handling both long-range and short-range feature interactions. Recently, vision mamba (ViM)…
Image similarity has been extensively studied in computer vision. In recent years, machine-learned models have shown their ability to encode more semantics than traditional multivariate metrics. However, in labelling semantic similarity,…
Modern deep neural network based object detection methods typically classify candidate proposals using their interior features. However, global and local surrounding contexts that are believed to be valuable for object detection are not…
The problem of cross-modality person re-identification has been receiving increasing attention recently, due to its practical significance. Motivated by the fact that human usually attend to the difference when they compare two similar…
Recently, chest X-ray report generation, which aims to automatically generate descriptions of given chest X-ray images, has received growing research interests. The key challenge of chest X-ray report generation is to accurately capture and…
We introduce Correlational Image Modeling (CIM), a novel and surprisingly effective approach to self-supervised visual pre-training. Our CIM performs a simple pretext task: we randomly crop image regions (exemplars) from an input image…
In this paper we explore two ways of using context for object detection. The first model focusses on people and the objects they commonly interact with, such as fashion and sports accessories. The second model considers more general object…
Radiologists usually observe anatomical regions of chest X-ray images as well as the overall image before making a decision. However, most existing deep learning models only look at the entire X-ray image for classification, failing to…