English

PSDiff: Diffusion Model for Person Search with Iterative and Collaborative Refinement

Computer Vision and Pattern Recognition 2024-12-31 v3

Abstract

Dominant Person Search methods aim to localize and recognize query persons in a unified network, which jointly optimizes two sub-tasks, \ie, pedestrian detection and Re-IDentification (ReID). Despite significant progress, current methods face two primary challenges: 1) the pedestrian candidates learned within detectors are suboptimal for the ReID task. 2) the potential for collaboration between two sub-tasks is overlooked. To address these issues, we present a novel Person Search framework based on the Diffusion model, PSDiff. PSDiff formulates the person search as a dual denoising process from noisy boxes and ReID embeddings to ground truths. Distinct from the conventional Detection-to-ReID approach, our denoising paradigm discards prior pedestrian candidates generated by detectors, thereby avoiding the local optimum problem of the ReID task. Following the new paradigm, we further design a new Collaborative Denoising Layer (CDL) to optimize detection and ReID sub-tasks in an iterative and collaborative way, which makes two sub-tasks mutually beneficial. Extensive experiments on the standard benchmarks show that PSDiff achieves state-of-the-art performance with fewer parameters and elastic computing overhead.

Keywords

Cite

@article{arxiv.2309.11125,
  title  = {PSDiff: Diffusion Model for Person Search with Iterative and Collaborative Refinement},
  author = {Chengyou Jia and Minnan Luo and Zhuohang Dang and Guang Dai and Xiaojun Chang and Jingdong Wang},
  journal= {arXiv preprint arXiv:2309.11125},
  year   = {2024}
}