Related papers: SimpleClick: Interactive Image Segmentation with S…
The goal of click-based interactive image segmentation is to obtain precise object segmentation masks with limited user interaction, i.e., by a minimal number of user clicks. Existing methods require users to provide all the clicks: by…
We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for object detection. This design enables the original ViT architecture to be fine-tuned for object detection without needing to redesign a hierarchical…
Interactive segmentation allows efficient label generation by leveraging user-provided clicks to progressively refine predictions, which is critical when fully supervised labels are costly or generalization to unseen classes is needed.…
Interactive image segmentation aims to segment the target from the background with the manual guidance, which takes as input multimodal data such as images, clicks, scribbles, and bounding boxes. Recently, vision transformers have achieved…
Interactive image segmentation enables users to interact minimally with a machine, facilitating the gradual refinement of the segmentation mask for a target of interest. Previous studies have demonstrated impressive performance in…
Click-based interactive segmentation aims to generate target masks via human clicking, which facilitates efficient pixel-level annotation and image editing. In such a task, target ambiguity remains a problem hindering the accuracy and…
We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation and propose the SegVit. Previous ViT-based segmentation networks usually learn a pixel-level representation from the output of the ViT. Differently, we…
Interactive segmentation allows users to extract target masks by making positive/negative clicks. Although explored by many previous works, there is still a gap between academic approaches and industrial needs: first, existing models are…
Uniform downsampling remains the de facto standard for reducing spatial resolution in vision backbones. In this work, we propose an alternative design built around a content-aware spatial grouping layer, that dynamically assigns tokens to a…
In the interactive segmentation, users initially click on the target object to segment the main body and then provide corrections on mislabeled regions to iteratively refine the segmentation masks. Most existing methods transform these…
Recent works on click-based interactive segmentation have demonstrated state-of-the-art results by using various inference-time optimization schemes. These methods are considerably more computationally expensive compared to feedforward…
Interactive segmentation enables users to extract binary masks of target objects through simple interactions such as clicks, scribbles, and boxes. However, existing methods often support only limited interaction forms and struggle to…
Segmentation of organs or lesions from medical images plays an essential role in many clinical applications such as diagnosis and treatment planning. Though Convolutional Neural Networks (CNN) have achieved the state-of-the-art performance…
From the simple measurement of tissue attributes in pathology workflow to designing an explainable diagnostic/prognostic AI tool, access to accurate semantic segmentation of tissue regions in histology images is a prerequisite. However,…
Interactive segmentation, a computer vision technique where a user provides guidance to help an algorithm segment a feature of interest in an image, has achieved outstanding accuracy and efficient human-computer interaction. However, few…
Foundation models for volumetric medical image segmentation have emerged as powerful tools in clinical workflows, enabling radiologists to delineate regions of interest through intuitive clicks. While these models demonstrate promising…
The increasing availability of digital 3D environments, whether through image-based 3D reconstruction, generation, or scans obtained by robots, is driving innovation across various applications. These come with a significant demand for 3D…
In the wake of Masked Image Modeling (MIM), a diverse range of plain, non-hierarchical Vision Transformer (ViT) models have been pre-trained with extensive datasets, offering new paradigms and significant potential for semantic…
Interactive segmentation aims to extract objects of interest from an image based on user-provided clicks. In real-world applications, there is often a need to segment a series of images featuring the same target object. However, existing…
In recent interactive segmentation algorithms, previous probability maps are used as network input to help predictions in the current segmentation round. However, despite the utilization of previous masks, useful information contained in…