Related papers: Visually Analyzing Contextualized Embeddings
Semantic embeddings have advanced the state of the art for countless natural language processing tasks, and various extensions to multimodal domains, such as visual-semantic embeddings, have been proposed. While the power of visual-semantic…
We present a clustering-based language model using word embeddings for text readability prediction. Presumably, an Euclidean semantic space hypothesis holds true for word embeddings whose training is done by observing word co-occurrences.…
In this paper, we propose a novel approach for text classification based on clustering word embeddings, inspired by the bag of visual words model, which is widely used in computer vision. After each word in a collection of documents is…
The neural architectures of language models are becoming increasingly complex, especially that of Transformers, based on the attention mechanism. Although their application to numerous natural language processing tasks has proven to be very…
Numerous embedding models have been recently explored to incorporate semantic knowledge into visual recognition. Existing methods typically focus on minimizing the distance between the corresponding images and texts in the embedding space…
Contextualized embeddings use unsupervised language model pretraining to compute word representations depending on their context. This is intuitively useful for generalization, especially in Named-Entity Recognition where it is crucial to…
We present a visually-grounded language understanding model based on a study of how people verbally describe objects in scenes. The emphasis of the model is on the combination of individual word meanings to produce meanings for complex…
Vector representations obtained from word embedding are the source of many groundbreaking advances in natural language processing. They yield word representations that are capable of capturing semantics and analogies of words within a text…
Static word embeddings that represent words by a single vector cannot capture the variability of word meaning in different linguistic and extralinguistic contexts. Building on prior work on contextualized and dynamic word embeddings, we…
Neural network based models are a very powerful tool for creating word embeddings, the objective of these models is to group similar words together. These embeddings have been used as features to improve results in various applications such…
Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based,…
We present a qualitative analysis of the (potentially erroneous) outputs of contextualized embedding-based methods for detecting diachronic semantic change. First, we introduce an ensemble method outperforming previously described…
Clustering artworks based on style can have many potential real-world applications like art recommendations, style-based search and retrieval, and the study of artistic style evolution of an artist or in an artwork corpus. We introduce and…
We present a universal framework to model contextualized sentence representations with visual awareness that is motivated to overcome the shortcomings of the multimodal parallel data with manual annotations. For each sentence, we first…
Multi-modal word semantics aims to enhance embeddings with perceptual input, assuming that human meaning representation is grounded in sensory experience. Most research focuses on evaluation involving direct visual input, however, visual…
Human-annotated attributes serve as powerful semantic embeddings in zero-shot learning. However, their annotation process is labor-intensive and needs expert supervision. Current unsupervised semantic embeddings, i.e., word embeddings,…
Recently introduced self-supervised methods for image representation learning provide on par or superior results to their fully supervised competitors, yet the corresponding efforts to explain the self-supervised approaches lag behind.…
Contextualised word vectors obtained via pre-trained language models encode a variety of knowledge that has already been exploited in applications. Complementary to these language models are probabilistic topic models that learn thematic…
In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition. The embeddings summarize the information of the mouth region that is relevant to the problem of word recognition, while…
We propose a novel approach to improve a visual-semantic embedding model by incorporating concept representations captured from an external structured knowledge base. We investigate its performance on image classification under both…