Related papers: Visual Probing: Cognitive Framework for Explaining…

Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning

The success of deep learning in computer vision is rooted in the ability of deep networks to scale up model complexity as demanded by challenging visual tasks. As complexity is increased, so is the need for large amounts of labeled data to…

Computer Vision and Pattern Recognition · Computer Science 2017-08-22 Gustav Larsson

Visually Guided Self Supervised Learning of Speech Representations

Self supervised representation learning has recently attracted a lot of research interest for both the audio and visual modalities. However, most works typically focus on a particular modality or feature alone and there has been very…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-21 Abhinav Shukla , Konstantinos Vougioukas , Pingchuan Ma , Stavros Petridis , Maja Pantic

Supervised Fine Tuning for Word Embedding with Integrated Knowledge

Learning vector representation for words is an important research field which may benefit many natural language processing tasks. Two limitations exist in nearly all available models, which are the bias caused by the context definition and…

Computation and Language · Computer Science 2015-06-01 Xuefeng Yang , Kezhi Mao

Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case

Semantic embeddings have advanced the state of the art for countless natural language processing tasks, and various extensions to multimodal domains, such as visual-semantic embeddings, have been proposed. While the power of visual-semantic…

Machine Learning · Computer Science 2021-02-23 Adam Dahlgren Lindström , Suna Bensch , Johanna Björklund , Frank Drewes

Recent Advancements in Self-Supervised Paradigms for Visual Feature Representation

We witnessed a massive growth in the supervised learning paradigm in the past decade. Supervised learning requires a large amount of labeled data to reach state-of-the-art performance. However, labeling the samples requires a lot of human…

Computer Vision and Pattern Recognition · Computer Science 2021-11-04 Mrinal Anand , Aditya Garg

Learning Visual Representations via Language-Guided Sampling

Although an object may appear in numerous contexts, we often describe it in a limited number of ways. Language allows us to abstract away visual variation to represent and communicate concepts. Building on this intuition, we propose an…

Computer Vision and Pattern Recognition · Computer Science 2023-03-30 Mohamed El Banani , Karan Desai , Justin Johnson

A Survey on Self-Supervised Representation Learning

Learning meaningful representations is at the heart of many tasks in the field of modern machine learning. Recently, a lot of methods were introduced that allow learning of image representations without supervision. These representations…

Machine Learning · Computer Science 2023-08-23 Tobias Uelwer , Jan Robine , Stefan Sylvius Wagner , Marc Höftmann , Eric Upschulte , Sebastian Konietzny , Maike Behrendt , Stefan Harmeling

Masked Modeling for Self-supervised Representation Learning on Vision and Beyond

As the deep learning revolution marches on, self-supervised learning has garnered increasing attention in recent years thanks to its remarkable representation learning ability and the low dependence on labeled data. Among these varied…

Computer Vision and Pattern Recognition · Computer Science 2024-01-10 Siyuan Li , Luyuan Zhang , Zedong Wang , Di Wu , Lirong Wu , Zicheng Liu , Jun Xia , Cheng Tan , Yang Liu , Baigui Sun , Stan Z. Li

Learning Representations by Predicting Bags of Visual Words

Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data. Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially…

Computer Vision and Pattern Recognition · Computer Science 2020-02-28 Spyros Gidaris , Andrei Bursuc , Nikos Komodakis , Patrick Pérez , Matthieu Cord

Semi-supervised Visual Feature Integration for Pre-trained Language Models

Integrating visual features has been proved useful for natural language understanding tasks. Nevertheless, in most existing multimodal language models, the alignment of visual and textual data is expensive. In this paper, we propose a novel…

Computation and Language · Computer Science 2020-08-14 Lisai Zhang , Qingcai Chen , Dongfang Li , Buzhou Tang

Learning Unsupervised Visual Grounding Through Semantic Self-Supervision

Localizing natural language phrases in images is a challenging problem that requires joint understanding of both the textual and visual modalities. In the unsupervised setting, lack of supervisory signals exacerbate this difficulty. In this…

Computer Vision and Pattern Recognition · Computer Science 2018-11-20 Syed Ashar Javed , Shreyas Saxena , Vineet Gandhi

Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition?

Self-supervised learning has attracted plenty of recent research interest. However, most works for self-supervision in speech are typically unimodal and there has been limited work that studies the interaction between audio and visual…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-19 Abhinav Shukla , Stavros Petridis , Maja Pantic

Universal Multimodal Representation for Language Understanding

Representation learning is the foundation of natural language processing (NLP). This work presents new methods to employ visual information as assistant signals to general NLP tasks. For each sentence, we first retrieve a flexible number of…

Computation and Language · Computer Science 2023-01-10 Zhuosheng Zhang , Kehai Chen , Rui Wang , Masao Utiyama , Eiichiro Sumita , Zuchao Li , Hai Zhao

Deep Learning Approaches on Image Captioning: A Review

Image captioning is a research area of immense importance, aiming to generate natural language descriptions for visual content in the form of still images. The advent of deep learning and more recently vision-language pre-training…

Computer Vision and Pattern Recognition · Computer Science 2023-08-29 Taraneh Ghandi , Hamidreza Pourreza , Hamidreza Mahyar

Visual Reasoning with Natural Language

Natural language provides a widely accessible and expressive interface for robotic agents. To understand language in complex environments, agents must reason about the full range of language inputs and their correspondence to the world.…

Computation and Language · Computer Science 2017-10-03 Stephanie Zhou , Alane Suhr , Yoav Artzi

Towards Efficient and Effective Self-Supervised Learning of Visual Representations

Self-supervision has emerged as a propitious method for visual representation learning after the recent paradigm shift from handcrafted pretext tasks to instance-similarity based approaches. Most state-of-the-art methods enforce similarity…

Computer Vision and Pattern Recognition · Computer Science 2022-10-19 Sravanti Addepalli , Kaushal Bhogale , Priyam Dey , R. Venkatesh Babu

Self-Supervised Image Captioning with CLIP

Image captioning, a fundamental task in vision-language understanding, seeks to generate accurate natural language descriptions for provided images. Current image captioning approaches heavily rely on high-quality image-caption pairs, which…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Chuanyang Jin

Scaling and Benchmarking Self-Supervised Visual Representation Learning

Self-supervised learning aims to learn representations from the data itself without explicit manual supervision. Existing efforts ignore a crucial aspect of self-supervised learning - the ability to scale to large amount of data because…

Computer Vision and Pattern Recognition · Computer Science 2019-06-07 Priya Goyal , Dhruv Mahajan , Abhinav Gupta , Ishan Misra

Self-Supervised Image Representation Learning: Transcending Masking with Paired Image Overlay

Self-supervised learning has become a popular approach in recent years for its ability to learn meaningful representations without the need for data annotation. This paper proposes a novel image augmentation technique, overlaying images,…

Computer Vision and Pattern Recognition · Computer Science 2023-01-25 Yinheng Li , Han Ding , Shaofei Wang

Incidental Supervision: Moving beyond Supervised Learning

Machine Learning and Inference methods have become ubiquitous in our attempt to induce more abstract representations of natural language text, visual scenes, and other messy, naturally occurring data, and support decisions that depend on…

Machine Learning · Computer Science 2020-05-27 Dan Roth