Related papers: Visual Accommodation: Rethinking Image Scale as a …
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection. Previous strategies like image pyramid, multi-scale training, and their variants are aiming at preparing…
Recently, object detection models have witnessed notable performance improvements, particularly with transformer-based models. However, new objects frequently appear in the real world, requiring detection models to continually learn without…
This paper introduces a new fundamental characteristic, \ie, the dynamic range, from real-world metric tools to deep visual recognition. In metrology, the dynamic range is a basic quality of a metric tool, indicating its flexibility to…
It is a common practice to exploit pyramidal feature representation to tackle the problem of scale variation in object instances. However, most of them still predict the objects in a certain range of scales based solely or mainly on a…
Given the variety of the visual world there is not one true scale for recognition: objects may appear at drastically different sizes across the visual field. Rather than enumerate variations across filter channels or pyramid levels, dynamic…
Object detection in aerial imagery presents a significant challenge due to large scale variations among objects. This paper proposes an evolutionary reinforcement learning agent, integrated within a coarse-to-fine object detection…
Scale variation has been a challenge from traditional to modern approaches in computer vision. Most solutions to scale issues have a similar theme: a set of intuitive and manually designed policies that are generic and fixed (e.g. SIFT or…
Detection Transformer (DETR) has redefined object detection by casting it as a set prediction task within an end-to-end framework. Despite its elegance, DETR and its variants still rely on fixed learnable queries and suffer from severe…
In the domain of moment retrieval, accurately identifying temporal segments within videos based on natural language queries remains challenging. Traditional methods often employ pre-trained models that struggle with fine-grained information…
This paper revisits the problem of orientation estimation for rigid bodies through a novel framework based on scalar measurements. Unlike traditional vector-based methods, the proposed approach enables selective utilization of only the…
Selectivity estimation aims at estimating the number of database objects that satisfy a selection criterion. Answering this problem accurately and efficiently is essential to many applications, such as density estimation, outlier detection,…
We address the problem of class incremental learning, which is a core step towards achieving adaptive vision intelligence. In particular, we consider the task setting of incremental learning with limited memory and aim to achieve better…
For object detection detectors, enhancing model performance hinges on the ability to simultaneously consider inconsistencies across tasks and focus on difficult-to-train samples. Achieving this necessitates incorporating information from…
High-dimensional, heterogeneous data with complex feature interactions pose significant challenges for traditional predictive modeling approaches. While Projection to Latent Structures (PLS) remains a popular technique, it struggles to…
In this paper, we seek to develop a versatile test-time adaptation (TTA) objective for a variety of tasks - classification and regression across image-, object-, and pixel-level predictions. We achieve this through a self-bootstrapping…
Scale variation is a deep-rooted problem in object counting, which has not been effectively addressed by existing scale-aware algorithms. An important factor is that they typically involve cooperative learning across multi-resolutions,…
We propose a one-stage framework for real-time multi-person 3D human mesh estimation from a single RGB image. While current one-stage methods, which follow a DETR-style pipeline, achieve state-of-the-art (SOTA) performance with…
Geometric variations of objects, which do not modify the object class, pose a major challenge for object recognition. These variations could be rigid as well as non-rigid transformations. In this paper, we design a framework for training…
LiDAR 3D object detection models are inevitably biased towards their training dataset. The detector clearly exhibits this bias when employed on a target dataset, particularly towards object sizes. However, object sizes vary heavily between…
In this paper, we introduce a shape-based, time-scale invariant feature descriptor for 1-D sensor signals. The time-scale invariance of the feature allows us to use feature from one training event to describe events of the same semantic…