Dominant Person Search methods aim to localize and recognize query persons in a unified network, which jointly optimizes two sub-tasks, \ie, pedestrian detection and Re-IDentification (ReID). Despite significant progress, current methods face two primary challenges: 1) the pedestrian candidates learned within detectors are suboptimal for the ReID task. 2) the potential for collaboration between two sub-tasks is overlooked. To address these issues, we present a novel Person Search framework based on the Diffusion model, PSDiff. PSDiff formulates the person search as a dual denoising process from noisy boxes and ReID embeddings to ground truths. Distinct from the conventional Detection-to-ReID approach, our denoising paradigm discards prior pedestrian candidates generated by detectors, thereby avoiding the local optimum problem of the ReID task. Following the new paradigm, we further design a new Collaborative Denoising Layer (CDL) to optimize detection and ReID sub-tasks in an iterative and collaborative way, which makes two sub-tasks mutually beneficial. Extensive experiments on the standard benchmarks show that PSDiff achieves state-of-the-art performance with fewer parameters and elastic computing overhead.
@article{arxiv.2309.11125,
title = {PSDiff: Diffusion Model for Person Search with Iterative and Collaborative Refinement},
author = {Chengyou Jia and Minnan Luo and Zhuohang Dang and Guang Dai and Xiaojun Chang and Jingdong Wang},
journal= {arXiv preprint arXiv:2309.11125},
year = {2024}
}