English
Related papers

Related papers: Self-Supervised Learning from Web Data for Multimo…

200 papers

In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We…

Computer Vision and Pattern Recognition · Computer Science 2018-08-21 Raul Gomez , Lluis Gomez , Jaume Gibert , Dimosthenis Karatzas

Cross-modal retrieval between visual data and natural language description remains a long-standing challenge in multimedia. While recent image-text retrieval methods offer great promise by learning deep representations aligned across…

End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible. In this paper we present a method that is able to take advantage of…

Computer Vision and Pattern Recognition · Computer Science 2017-05-25 Lluis Gomez , Yash Patel , Marçal Rusiñol , Dimosthenis Karatzas , C. V. Jawahar

Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities. In this context, this paper…

The immense success of deep learning based methods in computer vision heavily relies on large scale training datasets. These richly annotated datasets help the network learn discriminative visual features. Collecting and annotating such…

Computer Vision and Pattern Recognition · Computer Science 2018-07-09 Yash Patel , Lluis Gomez , Raul Gomez , Marçal Rusiñol , Dimosthenis Karatzas , C. V. Jawahar

A comprehensive understanding of vision and language and their interrelation are crucial to realize the underlying similarities and differences between these modalities and to learn more generalized, meaningful representations. In recent…

Computer Vision and Pattern Recognition · Computer Science 2021-12-10 Anindya Sundar Das , Sriparna Saha

There has been significant interest recently in learning multilingual word embeddings -- in which semantically similar words across languages have similar embeddings. State-of-the-art approaches have relied on expensive labeled data, which…

Computation and Language · Computer Science 2020-07-02 Karan Singhal , Karthik Raman , Balder ten Cate

This paper considers the task of matching images and sentences by learning a visual-textual embedding space for cross-modal retrieval. Finding such a space is a challenging task since the features and representations of text and image are…

Information Retrieval · Computer Science 2020-02-28 Hadi Abdi Khojasteh , Ebrahim Ansari , Parvin Razzaghi , Akbar Karimi

Learning social media data embedding by deep models has attracted extensive research interest as well as boomed a lot of applications, such as link prediction, classification, and cross-modal search. However, for social images which contain…

Multimedia · Computer Science 2017-10-19 Feiran Huang , Xiaoming Zhang , Zhoujun Li , Tao Mei , Yueying He , Zhonghua Zhao

Current approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word…

Computation and Language · Computer Science 2019-09-25 Danny Merkx , Stefan Frank

Many of the existing methods for learning joint embedding of images and text use only supervised information from paired images and its textual attributes. Taking advantage of the recent success of unsupervised learning in deep neural…

Computer Vision and Pattern Recognition · Computer Science 2017-03-21 Yao-Hung Hubert Tsai , Liang-Kang Huang , Ruslan Salakhutdinov

There has been an explosion of multimodal content generated on social media networks in the last few years, which has necessitated a deeper understanding of social media content and user behavior. We present a novel content-independent…

Information Retrieval · Computer Science 2019-06-12 Karan Sikka , Lucas Van Bramer , Ajay Divakaran

Image-based single-modality compression learning approaches have demonstrated exceptionally powerful encoding and decoding capabilities in the past few years , but suffer from blur and severe semantics loss at extremely low bitrates. To…

Image and Video Processing · Electrical Eng. & Systems 2023-04-27 Xuhao Jiang , Weimin Tan , Tian Tan , Bo Yan , Liquan Shen

Two modalities are often used to convey information in a complementary and beneficial manner, e.g., in online news, videos, educational resources, or scientific publications. The automatic understanding of semantic correlations between text…

Multimedia · Computer Science 2019-06-21 Christian Otto , Matthias Springstein , Avishek Anand , Ralph Ewerth

In this paper, we explore the learning of neural network embeddings for natural images and speech waveforms describing the content of those images. These embeddings are learned directly from the waveforms without the use of linguistic…

Computation and Language · Computer Science 2018-04-10 David Harwath , Galen Chuang , James Glass

In text recognition, self-supervised pre-training emerges as a good solution to reduce dependence on expansive annotated real data. Previous studies primarily focus on local visual representation by leveraging mask image modeling or…

Computer Vision and Pattern Recognition · Computer Science 2024-05-14 Zuan Gao , Yuxin Wang , Yadong Qu , Boqiang Zhang , Zixiao Wang , Jianjun Xu , Hongtao Xie

Until recently, the number of public real-world text images was insufficient for training scene text recognizers. Therefore, most modern training methods rely on synthetic data and operate in a fully supervised manner. Nevertheless, the…

Computer Vision and Pattern Recognition · Computer Science 2022-05-10 Aviad Aberdam , Roy Ganz , Shai Mazor , Ron Litman

While recent research in image understanding has often focused on recognizing more types of objects, understanding more about the objects is just as important. Recognizing object parts and attributes has been extensively studied before, yet…

Computer Vision and Pattern Recognition · Computer Science 2021-12-03 David Novotny , Diane Larlus , Andrea Vedaldi

Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places. However, collecting and annotating such datasets requires a…

Computer Vision and Pattern Recognition · Computer Science 2019-02-04 Yash Patel , Lluis Gomez , Marçal Rusiñol , Dimosthenis Karatzas , C. V. Jawahar

Recent work has managed to learn cross-lingual word embeddings without parallel data by mapping monolingual embeddings to a shared space through adversarial training. However, their evaluation has focused on favorable conditions, using…

Computation and Language · Computer Science 2021-12-28 Mikel Artetxe , Gorka Labaka , Eneko Agirre
‹ Prev 1 2 3 10 Next ›