English
Related papers

Related papers: Efficient Large-Scale Visual Representation Learni…

200 papers

Understanding vision and language representations of product content is vital for search and recommendation applications in e-commerce. As a backbone for online shopping platforms and inspired by the recent success in representation…

Machine Learning · Computer Science 2022-08-23 Wonyoung Shin , Jonghun Park , Taekang Woo , Yongwoo Cho , Kwangjin Oh , Hwanjun Song

Vision-transformers (ViTs) and large-scale convolution-neural-networks (CNNs) have reshaped computer vision through pretrained feature representations that enable strong transfer learning for diverse tasks. However, their efficiency as…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Alon Kaya , Igal Bilik , Inna Stainvas

In this paper, we present a unified end-to-end approach to build a large scale Visual Search and Recommendation system for e-commerce. Previous works have targeted these problems in isolation. We believe a more effective and elegant…

Computer Vision and Pattern Recognition · Computer Science 2017-03-08 Devashish Shankar , Sujay Narumanchi , H A Ananya , Pramod Kompalli , Krishnendu Chaudhury

Large-scale pretraining of visual representations has led to state-of-the-art performance on a range of benchmark computer vision tasks, yet the benefits of these techniques at extreme scale in complex production systems has been relatively…

Computer Vision and Pattern Recognition · Computer Science 2021-08-13 Josh Beal , Hao-Yu Wu , Dong Huk Park , Andrew Zhai , Dmitry Kislyuk

In this paper, we propose a novel Convolutional Neural Network (CNN) architecture for learning multi-scale feature representations with good tradeoffs between speed and accuracy. This is achieved by using a multi-branch network, which has…

Computer Vision and Pattern Recognition · Computer Science 2019-08-01 Chun-Fu Chen , Quanfu Fan , Neil Mallinar , Tom Sercu , Rogerio Feris

Deep learning architectures are showing great promise in various computer vision domains including image classification, object detection, event detection and action recognition. In this study, we investigate various aspects of…

Computer Vision and Pattern Recognition · Computer Science 2016-08-08 Hilal Ergun , Mustafa Sert

Vision Transformers (ViT) have recently demonstrated the significant potential of transformer architectures for computer vision. To what extent can image-based deep reinforcement learning also benefit from ViT architectures, as compared to…

Machine Learning · Computer Science 2022-05-17 Tianxin Tao , Daniele Reda , Michiel van de Panne

Convolutional neural networks (CNNs) have so far been the de-facto model for visual data. Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks. This…

Computer Vision and Pattern Recognition · Computer Science 2022-03-07 Maithra Raghu , Thomas Unterthiner , Simon Kornblith , Chiyuan Zhang , Alexey Dosovitskiy

This paper investigates two techniques for developing efficient self-supervised vision transformers (EsViT) for visual representation learning. First, we show through a comprehensive empirical study that multi-stage architectures with…

Computer Vision and Pattern Recognition · Computer Science 2022-07-08 Chunyuan Li , Jianwei Yang , Pengchuan Zhang , Mei Gao , Bin Xiao , Xiyang Dai , Lu Yuan , Jianfeng Gao

Learning efficient and expressive visual representation has long been the pursuit of computer vision research. While Vision Transformers (ViTs) gradually replace traditional Convolutional Neural Networks (CNNs) as more scalable vision…

Computer Vision and Pattern Recognition · Computer Science 2026-03-23 Quan Kong , Yanru Xiao , Yuhao Shen , Cong Wang

We apply pre-trained architectures, originally developed for the ImageNet Large Scale Visual Recognition Challenge, for periocular recognition. These architectures have demonstrated significant success in various computer vision tasks…

Computer Vision and Pattern Recognition · Computer Science 2024-10-08 Fernando Alonso-Fernandez , Kevin Hernandez-Diaz , Prayag Tiwari , Josef Bigun

Vision Transformer (ViT) demonstrates that Transformer for natural language processing can be applied to computer vision tasks and result in comparable performance to convolutional neural networks (CNN), which have been studied and adopted…

Computer Vision and Pattern Recognition · Computer Science 2021-09-03 Yi-Lun Liao , Sertac Karaman , Vivienne Sze

Vision transformers have attracted much attention from computer vision researchers as they are not restricted to the spatial inductive bias of ConvNets. However, although Transformer-based backbones have achieved much progress on ImageNet…

Computer Vision and Pattern Recognition · Computer Science 2021-08-18 Hong-Yu Zhou , Chixiang Lu , Sibei Yang , Yizhou Yu

E-commerce product understanding demands by nature, strong multimodal comprehension from text, images, and structured attributes. General-purpose Vision-Language Models (VLMs) enable generalizable multimodal latent modelling, yet there is…

The Transformer architecture has achieved significant success in natural language processing, motivating its adaptation to computer vision tasks. Unlike convolutional neural networks, vision transformers inherently capture long-range…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Zherui Zhang , Rongtao Xu , Jie Zhou , Changwei Wang , Xingtian Pei , Wenhao Xu , Jiguang Zhang , Li Guo , Longxiang Gao , Wenbo Xu , Shibiao Xu

Visual Transformers (VTs) are emerging as an architectural paradigm alternative to Convolutional networks (CNNs). Differently from CNNs, VTs can capture global relations between image elements and they potentially have a larger…

Computer Vision and Pattern Recognition · Computer Science 2021-11-16 Yahui Liu , Enver Sangineto , Wei Bi , Nicu Sebe , Bruno Lepri , Marco De Nadai

This paper addresses the challenges in representation learning of 3D shape features by investigating state-of-the-art backbones paired with both contrastive supervised and self-supervised learning objectives. Computer vision methods…

Computer Vision and Pattern Recognition · Computer Science 2025-10-24 Márcus Vinícius Lobo Costa , Sherlon Almeida da Silva , Bárbara Caroline Benato , Leo Sampaio Ferraz Ribeiro , Moacir Antonelli Ponti

While visual imitation learning offers one of the most effective ways of learning from visual demonstrations, generalizing from them requires either hundreds of diverse demonstrations, task specific priors, or large, hard-to-train…

Robotics · Computer Science 2021-12-07 Jyothish Pari , Nur Muhammad Shafiullah , Sridhar Pandian Arunachalam , Lerrel Pinto

Remote sensing imagery plays a crucial role in many applications and requires accurate computerized classification techniques. Reliable classification is essential for transforming raw imagery into structured and usable information. While…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Niful Islam , Md. Rayhan Ahmed , Nur Mohammad Fahad , Salekul Islam , A. K. M. Muzahidul Islam , Saddam Mukta , Swakkhar Shatabda

Convolutional Neural Networks (CNNs) have revolutionized the understanding of visual content. This is mainly due to their ability to break down an image into smaller pieces, extract multi-scale localized features and compose them to…

Computer Vision and Pattern Recognition · Computer Science 2021-10-26 Zachary Wharton , Ardhendu Behera , Asish Bera
‹ Prev 1 2 3 10 Next ›