English
Related papers

Related papers: Scaling 4D Representations

200 papers

Self-supervised learning aims to learn representations from the data itself without explicit manual supervision. Existing efforts ignore a crucial aspect of self-supervised learning - the ability to scale to large amount of data because…

Computer Vision and Pattern Recognition · Computer Science 2019-06-07 Priya Goyal , Dhruv Mahajan , Abhinav Gupta , Ishan Misra

This paper asks whether current self-supervised learning methods, if sufficiently scaled up, would be able to reach human-level visual object recognition capabilities with the same type and amount of visual experience humans learn from.…

Computer Vision and Pattern Recognition · Computer Science 2023-08-11 A. Emin Orhan

This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core…

Computer Vision and Pattern Recognition · Computer Science 2021-12-21 Kaiming He , Xinlei Chen , Saining Xie , Yanghao Li , Piotr Dollár , Ross Girshick

Self-supervised tasks have been utilized to build useful representations that can be used in downstream tasks when the annotation is unavailable. In this paper, we introduce a self-supervised video representation learning method based on…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Duc Quang Vu , Ngan T. H. Le , Jia-Ching Wang

Most of the existing video self-supervised methods mainly leverage temporal signals of videos, ignoring that the semantics of moving objects and environmental information are all critical for video-related tasks. In this paper, we propose a…

Computer Vision and Pattern Recognition · Computer Science 2021-07-09 Wei Li , Dezhao Luo , Bo Fang , Yu Zhou , Weiping Wang

Self-supervised representation learning for point cloud videos remains a challenging problem with two key limitations: (1) existing methods rely on explicit knowledge to learn motion, resulting in suboptimal representations; (2) prior…

Computer Vision and Pattern Recognition · Computer Science 2025-05-21 Zhi Zuo , Chenyi Zhuang , Pan Gao , Jie Qin , Hao Feng , Nicu Sebe

Self supervision and natural language supervision have emerged as two exciting ways to train general purpose image encoders which excel at a variety of downstream tasks. Recent works such as M3AE and SLIP have suggested that these…

Computer Vision and Pattern Recognition · Computer Science 2023-05-16 Floris Weers , Vaishaal Shankar , Angelos Katharopoulos , Yinfei Yang , Tom Gunter

Self-supervised learning is an effective way for label-free model pre-training, especially in the video domain where labeling is expensive. Existing self-supervised works in the video domain use varying experimental setups to demonstrate…

Computer Vision and Pattern Recognition · Computer Science 2023-11-22 Akash Kumar , Ashlesha Kumar , Vibhav Vineet , Yogesh Singh Rawat

Current video-based Masked Autoencoders (MAEs) primarily focus on learning effective spatiotemporal representations from a visual perspective, which may lead the model to prioritize general spatial-temporal patterns but often overlook…

Computer Vision and Pattern Recognition · Computer Science 2025-02-13 Shihab Aaqil Ahamed , Malitha Gunawardhana , Liel David , Michael Sidorov , Daniel Harari , Muhammad Haris Khan

In this work, we explore self-supervised visual pre-training on images from diverse, in-the-wild videos for real-world robotic tasks. Like prior work, our visual representations are pre-trained via a masked autoencoder (MAE), frozen, and…

Robotics · Computer Science 2022-10-07 Ilija Radosavovic , Tete Xiao , Stephen James , Pieter Abbeel , Jitendra Malik , Trevor Darrell

Learning robust and scalable visual representations from massive multi-view video data remains a challenge in computer vision and autonomous driving. Existing pre-training methods either rely on expensive supervised learning with 3D…

Computer Vision and Pattern Recognition · Computer Science 2024-03-14 Jialv Zou , Bencheng Liao , Qian Zhang , Wenyu Liu , Xinggang Wang

The remarkable success of deep learning in various domains relies on the availability of large-scale annotated datasets. However, obtaining annotations is expensive and requires great effort, which is especially challenging for videos.…

Computer Vision and Pattern Recognition · Computer Science 2023-07-20 Madeline C. Schiappa , Yogesh S. Rawat , Mubarak Shah

Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales. Such models…

Computer Vision and Pattern Recognition · Computer Science 2023-09-25 Colorado J. Reed , Ritwik Gupta , Shufan Li , Sarah Brockman , Christopher Funk , Brian Clipp , Kurt Keutzer , Salvatore Candido , Matt Uyttendaele , Trevor Darrell

As the deep learning revolution marches on, self-supervised learning has garnered increasing attention in recent years thanks to its remarkable representation learning ability and the low dependence on labeled data. Among these varied…

Computer Vision and Pattern Recognition · Computer Science 2024-01-10 Siyuan Li , Luyuan Zhang , Zedong Wang , Di Wu , Lirong Wu , Zicheng Liu , Jun Xia , Cheng Tan , Yang Liu , Baigui Sun , Stan Z. Li

This paper proposes a novel pretext task to address the self-supervised video representation learning problem. Specifically, given an unlabeled video clip, we compute a series of spatio-temporal statistical summaries, such as the spatial…

Computer Vision and Pattern Recognition · Computer Science 2021-02-01 Jiangliu Wang , Jianbo Jiao , Linchao Bao , Shengfeng He , Wei Liu , Yun-hui Liu

Recent self-supervised learning models simulate the development of semantic object representations by training on visual experience similar to that of toddlers. However, these models ignore the foveated nature of human vision with high/low…

Computer Vision and Pattern Recognition · Computer Science 2025-09-22 Zhengyang Yu , Arthur Aubret , Chen Yu , Jochen Triesch

We present an extension to masked autoencoders (MAE) which improves on the representations learnt by the model by explicitly encouraging the learning of higher scene-level features. We do this by: (i) the introduction of a perceptual…

Computer Vision and Pattern Recognition · Computer Science 2023-03-29 Samyakh Tukra , Frederick Hoffman , Ken Chatfield

Recent advances in deep learning have achieved promising performance for medical image analysis, while in most cases ground-truth annotations from human experts are necessary to train the deep model. In practice, such annotations are…

Computer Vision and Pattern Recognition · Computer Science 2020-03-03 Jianbo Jiao , Richard Droste , Lior Drukker , Aris T. Papageorghiou , J. Alison Noble

Inferring biological relationships from cellular phenotypes in high-content microscopy screens provides significant opportunity and challenge in biological research. Prior results have shown that deep vision models can capture biological…

Self-supervised learning has transformed 2D computer vision by enabling models trained on large, unannotated datasets to provide versatile off-the-shelf features that perform similarly to models trained with labels. However, in 3D scene…

Computer Vision and Pattern Recognition · Computer Science 2025-04-10 Pedro Hermosilla , Christian Stippel , Leon Sick
‹ Prev 1 2 3 10 Next ›