English

Playing for Benchmarks

Computer Vision and Pattern Recognition 2017-09-22 v1

Abstract

We present a benchmark suite for visual perception. The benchmark is based on more than 250K high-resolution video frames, all annotated with ground-truth data for both low-level and high-level vision tasks, including optical flow, semantic instance segmentation, object detection and tracking, object-level 3D scene layout, and visual odometry. Ground-truth data for all tasks is available for every frame. The data was collected while driving, riding, and walking a total of 184 kilometers in diverse ambient conditions in a realistic virtual world. To create the benchmark, we have developed a new approach to collecting ground-truth data from simulated worlds without access to their source code or content. We conduct statistical analyses that show that the composition of the scenes in the benchmark closely matches the composition of corresponding physical environments. The realism of the collected data is further validated via perceptual experiments. We analyze the performance of state-of-the-art methods for multiple tasks, providing reference baselines and highlighting challenges for future research. The supplementary video can be viewed at https://youtu.be/T9OybWv923Y

Keywords

Cite

@article{arxiv.1709.07322,
  title  = {Playing for Benchmarks},
  author = {Stephan R. Richter and Zeeshan Hayder and Vladlen Koltun},
  journal= {arXiv preprint arXiv:1709.07322},
  year   = {2017}
}

Comments

Published at the International Conference on Computer Vision (ICCV 2017)