Parallelized Spatiotemporal Binding
Abstract
While modern best practices advocate for scalable architectures that support long-range interactions, object-centric models are yet to fully embrace these architectures. In particular, existing object-centric models for handling sequential inputs, due to their reliance on RNN-based implementation, show poor stability and capacity and are slow to train on long sequences. We introduce Parallelizable Spatiotemporal Binder or PSB, the first temporally-parallelizable slot learning architecture for sequential inputs. Unlike conventional RNN-based approaches, PSB produces object-centric representations, known as slots, for all time-steps in parallel. This is achieved by refining the initial slots across all time-steps through a fixed number of layers equipped with causal attention. By capitalizing on the parallelism induced by our architecture, the proposed model exhibits a significant boost in efficiency. In experiments, we test PSB extensively as an encoder within an auto-encoding framework paired with a wide variety of decoder options. Compared to the state-of-the-art, our architecture demonstrates stable training on longer sequences, achieves parallelization that results in a 60% increase in training speed, and yields performance that is on par with or better on unsupervised 2D and 3D object-centric scene decomposition and understanding.
Cite
@article{arxiv.2402.17077,
title = {Parallelized Spatiotemporal Binding},
author = {Gautam Singh and Yue Wang and Jiawei Yang and Boris Ivanovic and Sungjin Ahn and Marco Pavone and Tong Che},
journal= {arXiv preprint arXiv:2402.17077},
year = {2024}
}
Comments
See project page at https://parallel-st-binder.github.io