Ruize Han — Scifaro

LV-OSD: Language-Vision-Complementary Open-Set Object Detection

Object detection is an important task in computer vision, which aims to detect the objects of interest. through the given category list or query images. In this work, we propose a new problem of language-visual-complementary open-set object…

Computer Vision and Pattern Recognition · Computer Science 2026-05-28 Yupeng Zhang , Ruize Han , Wei Feng , Song Wang , Liang Wan

COVD: Continual Open-Vocabulary Object Detection with Novel Concept Injection

Open-vocabulary object detection (OVD) has made significant progress, enabling detectors to generalize from seen to unseen categories. However, real-world category spaces continually evolve, and existing OVD models still struggle with newly…

Computer Vision and Pattern Recognition · Computer Science 2026-05-27 Yupeng Zhang , Ruize Han , Yuzhong Feng , Zixin Ren , Yuntong Tian , Liang Wan

ODOV: Benchmark the Open-Domain Open-Vocabulary Object Detection

Existing studies typically investigate domain shift and category shift as independent problems, however, in real-world scenarios, the two types of shifts often occur simultaneously and interact, leading to significant degradation in…

Computer Vision and Pattern Recognition · Computer Science 2026-05-27 Yupeng Zhang , Ruize Han , Fangnan Zhou , Wei Feng , Liang Wan

VFM$^{4}$SDG: Unveiling the Power of VFMs for Single-Domain Generalized Object Detection

Real-world weather, illumination, and imaging variations often induce severe domain shifts, degrading single-source detectors in unseen environments. Existing single-domain generalized object detection (SDGOD) methods mainly rely on data…

Computer Vision and Pattern Recognition · Computer Science 2026-05-25 Yupeng Zhang , Ruize Han , Ningnan Guo , Wei Feng , Song Wang , Liang Wan

The First Challenge on Remote Sensing Infrared Image Super-Resolution at NTIRE 2026: Benchmark Results and Method Overview

This paper presents the NTIRE 2026 Remote Sensing Infrared Image Super-Resolution (x4) Challenge, one of the associated challenges of NTIRE 2026. The challenge aims to recover high-resolution (HR) infrared images from low-resolution (LR)…

Computer Vision and Pattern Recognition · Computer Science 2026-04-24 Kai Liu , Haoyang Yue , Zeli Lin , Zheng Chen , Jingkai Wang , Jue Gong , Jiatong Li , Xianglong Yan , Libo Zhu , Jianze Li , Ziqing Zhang , Zihan Zhou , Xiaoyang Liu , Radu Timofte , Yulun Zhang , Junye Chen , Zhenming Yan , Yucong Hong , Ruize Han , Song Wang , Li Pang , Heng Zhao , Xinqiao Wu , Deyu Meng , Xiangyong Cao , Weijun Yuan , Zhan Li , Zhanglu Chen , Boyang Yao , Yihang Chen , Yifan Deng , Zengyuan Zuo , Junjun Jiang , Saiprasad Meesiyawar , Sulocha Yatageri , Nikhil Akalwadi , Ramesh Ashok Tabib , Uma Mudenagudi , Jiachen Tu , Yaokun Shi , Guoyi Xu , Yaoxin Jiang , Cici Liu , Tongyao Mu , Qiong Cao , Yifan Wang , Kosuke Shigematsu , Hiroto Shirono , Asuka Shin , Wei Zhou , Linfeng Li , Lingdong Kong , Ce Wang , Xingwei Zhong , Wanjie Sun , Dafeng Zhang , Hongxin Lan , Qisheng Xu , Mingyue He , Hui Geng , Tianjiao Wan , Kele Xu , Changjian Wang , Antoine Carreaud , Nicola Santacroce , Shanci Li , Jan Skaloud , Adrien Gressin

The Fourth Challenge on Image Super-Resolution ($\times$4) at NTIRE 2026: Benchmark Results and Method Overview

This paper presents the NTIRE 2026 image super-resolution ($\times$4) challenge, one of the associated competitions of the NTIRE 2026 Workshop at CVPR 2026. The challenge aims to reconstruct high-resolution (HR) images from low-resolution…

Computer Vision and Pattern Recognition · Computer Science 2026-04-17 Zheng Chen , Kai Liu , Jingkai Wang , Xianglong Yan , Jianze Li , Ziqing Zhang , Jue Gong , Jiatong Li , Lei Sun , Xiaoyang Liu , Radu Timofte , Yulun Zhang , Jihye Park , Yoonjin Im , Hyungju Chun , Hyunhee Park , MinKyu Park , Zheng Xie , Xiangyu Kong , Weijun Yuan , Zhan Li , Qiurong Song , Luen Zhu , Fengkai Zhang , Xinzhe Zhu , Junyang Chen , Congyu Wang , Yixin Yang , Zhaorun Zhou , Jiangxin Dong , Jinshan Pan , Shengwei Wang , Jiajie Ou , Baiang Li , Sizhuo Ma , Qiang Gao , Jusheng Zhang , Jian Wang , Keze Wang , Yijiao Liu , Yingsi Chen , Hui Li , Yu Wang , Congchao Zhu , Saeed Ahmad , Ik Hyun Lee , Jun Young Park , Ji Hwan Yoon , Kainan Yan , Zian Wang , Weibo Wang , Shihao Zou , Chao Dong , Wei Zhou , Linfeng Li , Jaeseong Lee , Jaeho Chae , Jinwoo Kim , Seonjoo Kim , Yucong Hong , Zhenming Yan , Junye Chen , Ruize Han , Song Wang , Yuxuan Jiang , Chengxi Zeng , Tianhao Peng , Fan Zhang , David Bull , Tongyao Mu , Qiong Cao , Yifan Wang , Youwei Pan , Leilei Cao , Xiaoping Peng , Wei Deng , Yifei Chen , Wenbo Xiong , Xian Hu , Yuxin Zhang , Xiaoyun Cheng , Yang Ji , Zonghao Chen , Zhihao Xue , Junqin Hu , Nihal Kumar , Snehal Singh Tomar , Klaus Mueller , Surya Vashisth , Prateek Shaily , Jayant Kumar , Hardik Sharma , Ashish Negi , Sachin Chaudhary , Akshay Dudhane , Praful Hambarde , Amit Shukla , Shijun Shi , Jiangning Zhang , Yong Liu , Kai Hu , Jing Xu , Xianfang Zeng , Amitesh M , Hariharan S , Chia-Ming Lee , Yu-Fan Lin , Chih-Chung Hsu , Nishalini K , Sreenath K A , Bilel Benjdira , Anas M. Ali , Wadii Boulila , Shuling Zheng , Zhiheng Fu , Feng Zhang , Zhanglu Chen , Boyang Yao , Nikhil Pathak , Aagam Jain , Milan Kumar , Kishor Upla , Vivek Chavda , Sarang N S , Raghavendra Ramachandra , Zhipeng Zhang , Qi Wang , Shiyu Wang , Jiachen Tu , Guoyi Xu , Yaoxin Jiang , Jiajia Liu , Yaokun Shi , Yuqi Li , Chuanguang Yang , Weilun Feng , Zhuzhi Hong , Hao Wu , Junming Liu , Yingli Tian , Amish Bhushan Kulkarni , Tejas R R Shet , Saakshi M Vernekar , Nikhil Akalwadi , Kaushik Mallibhat , Ramesh Ashok Tabib , Uma Mudenagudi , Yuwen Pan , Tianrun Chen , Deyi Ji , Qi Zhu , Lanyun Zhu , Heyan Zhangyi

Online Reasoning Video Object Segmentation

Reasoning video object segmentation predicts pixel-level masks in videos from natural-language queries that may involve implicit and temporally grounded references. However, existing methods are developed and evaluated in an offline regime,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Jinyuan Liu , Yang Wang , Zeyu Zhao , Weixin Li , Song Wang , Ruize Han

BoxTuning: Directly Injecting the Object Box for Multimodal Model Fine-Tuning

Object-level spatial-temporal understanding is essential for video question answering, yet existing multimodal large language models (MLLMs) encode frames holistically and lack explicit mechanisms for fine-grained object grounding. Recent…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Zekun Qian , Ruize Han , Wei Feng

COVTrack++: Learning Open-Vocabulary Multi-Object Tracking from Continuous Videos via a Synergistic Paradigm

Multi-Object Tracking (MOT) has traditionally focused on a few specific categories, restricting its applicability to real-world scenarios involving diverse objects. Open-Vocabulary Multi-Object Tracking (OVMOT) addresses this by enabling…

Computer Vision and Pattern Recognition · Computer Science 2026-03-26 Zekun Qian , Wei Feng , Ruize Han , Junhui Hou

NoOVD: Novel Category Discovery and Embedding for Open-Vocabulary Object Detection

Despite the remarkable progress in open-vocabulary object detection (OVD), a significant gap remains between the training and testing phases. During training, the RPN and RoI heads often misclassify unlabeled novel-category objects as…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Yupeng Zhang , Ruize Han , Zhiwei Chen , Wei Feng , Liang Wan

From Indoor To Outdoor: Unsupervised Domain Adaptive Gait Recognition

Gait recognition is an important AI task, which has been progressed rapidly with the development of deep learning. However, existing learning based gait recognition methods mainly focus on the single domain, especially the constrained…

Computer Vision and Pattern Recognition · Computer Science 2025-11-19 Likai Wang , Ruize Han , Wei Feng , Song Wang

CLIPVehicle: A Unified Framework for Vision-based Vehicle Search

Vehicles, as one of the most common and significant objects in the real world, the researches on which using computer vision technologies have made remarkable progress, such as vehicle detection, vehicle re-identification, etc. To search an…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Likai Wang , Ruize Han , Xiangqun Zhang , Wei Feng

Synthetic-To-Real Video Person Re-ID

Person re-identification (Re-ID) is an important task and has significant applications for public security and information forensics, which has progressed rapidly with the development of deep learning. In this work, we investigate a novel…

Computer Vision and Pattern Recognition · Computer Science 2025-02-05 Xiangqun Zhang , Wei Feng , Ruize Han , Likai Wang , Linqi Song , Junhui Hou

OVT-B: A New Large-Scale Benchmark for Open-Vocabulary Multi-Object Tracking

Open-vocabulary object perception has become an important topic in artificial intelligence, which aims to identify objects with novel classes that have not been seen during training. Under this setting, open-vocabulary object detection…

Computer Vision and Pattern Recognition · Computer Science 2024-10-24 Haiji Liang , Ruize Han

VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking

Open-vocabulary multi-object tracking (OVMOT) represents a critical new challenge involving the detection and tracking of diverse object categories in videos, encompassing both seen categories (base classes) and unseen categories (novel…

Computer Vision and Pattern Recognition · Computer Science 2024-10-14 Zekun Qian , Ruize Han , Junhui Hou , Linqi Song , Wei Feng

OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking

We study a novel yet practical problem of open-corpus multi-object tracking (OCMOT), which extends the MOT into localizing, associating, and recognizing generic-category objects of both seen (base) and unseen (novel) classes, but without…

Computer Vision and Pattern Recognition · Computer Science 2024-07-22 Zekun Qian , Ruize Han , Wei Feng , Junhui Hou , Linqi Song , Song Wang

Robust Collaborative Perception without External Localization and Clock Devices

A consistent spatial-temporal coordination across multiple agents is fundamental for collaborative perception, which seeks to improve perception abilities through information exchange among agents. To achieve this spatial-temporal…

Artificial Intelligence · Computer Science 2024-06-03 Zixing Lei , Zhenyang Ni , Ruize Han , Shuo Tang , Dingju Wang , Chen Feng , Siheng Chen , Yanfeng Wang

From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration

We tackle a new problem of multi-view camera and subject registration in the bird's eye view (BEV) without pre-given camera calibration. This is a very challenging problem since its only input is several RGB images from different…

Computer Vision and Pattern Recognition · Computer Science 2024-04-30 Zekun Qian , Ruize Han , Wei Feng , Feifan Wang , Song Wang

Unveiling the Power of Self-supervision for Multi-view Multi-human Association and Tracking

Multi-view multi-human association and tracking (MvMHAT), is a new but important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well as to identify the same person across…

Computer Vision and Pattern Recognition · Computer Science 2024-02-01 Wei Feng , Feifan Wang , Ruize Han , Zekun Qian , Song Wang

Combining the Silhouette and Skeleton Data for Gait Recognition

Gait recognition, a long-distance biometric technology, has aroused intense interest recently. Currently, the two dominant gait recognition works are appearance-based and model-based, which extract features from silhouettes and skeletons,…

Computer Vision and Pattern Recognition · Computer Science 2023-03-27 Likai Wang , Ruize Han , Wei Feng