Related papers: Learning Language Structures through Grounding

Learning Zero-Shot Multifaceted Visually Grounded Word Embeddings via Multi-Task Training

Language grounding aims at linking the symbolic representation of language (e.g., words) into the rich perceptual knowledge of the outside world. The general approach is to embed both textual and visual information into a common space -the…

Computation and Language · Computer Science 2021-09-15 Hassan Shahmohammadi , Hendrik P. A. Lensch , R. Harald Baayen

Language with Vision: a Study on Grounded Word and Sentence Embeddings

Grounding language in vision is an active field of research seeking to construct cognitively plausible word and sentence representations by incorporating perceptual knowledge from vision into text-based representations. Despite many…

Computation and Language · Computer Science 2023-11-01 Hassan Shahmohammadi , Maria Heitmeier , Elnaz Shafaei-Bajestan , Hendrik P. A. Lensch , Harald Baayen

Neural Variational Learning for Grounded Language Acquisition

We propose a learning system in which language is grounded in visual percepts without specific pre-defined categories of terms. We present a unified generative method to acquire a shared semantic/visual embedding that enables the learning…

Computation and Language · Computer Science 2021-08-02 Nisha Pillai , Cynthia Matuszek , Francis Ferraro

Weakly-supervised Visual Grounding of Phrases with Linguistic Structures

We propose a weakly-supervised approach that takes image-sentence pairs as input and learns to visually ground (i.e., localize) arbitrary linguistic phrases, in the form of spatial attention masks. Specifically, the model is trained with…

Computer Vision and Pattern Recognition · Computer Science 2017-05-04 Fanyi Xiao , Leonid Sigal , Yong Jae Lee

Towards Understanding Language through Perception in Situated Human-Robot Interaction: From Word Grounding to Grammar Induction

Robots are widely collaborating with human users in diferent tasks that require high-level cognitive functions to make them able to discover the surrounding environment. A difcult challenge that we briefy highlight in this short paper is…

Computation and Language · Computer Science 2020-03-16 Amir Aly , Tadahiro Taniguchi

Visually grounded models of spoken language: A survey of datasets, architectures and evaluation techniques

This survey provides an overview of the evolution of visually grounded models of spoken language over the last 20 years. Such models are inspired by the observation that when children pick up a language, they rely on a wide range of…

Artificial Intelligence · Computer Science 2022-02-22 Grzegorz Chrupała

Incorporating Visual Semantics into Sentence Representations within a Grounded Space

Language grounding is an active field aiming at enriching textual representations with visual information. Generally, textual and visual elements are embedded in the same representation space, which implicitly assumes a one-to-one…

Computation and Language · Computer Science 2020-02-10 Patrick Bordes , Eloi Zablocki , Laure Soulier , Benjamin Piwowarski , Patrick Gallinari

Visual Grounding Helps Learn Word Meanings in Low-Data Regimes

Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension, and their internal representations are remarkably well-aligned with representations of language in the human brain. But to…

Computation and Language · Computer Science 2024-03-27 Chengxu Zhuang , Evelina Fedorenko , Jacob Andreas

Grounded Semantic Composition for Visual Scenes

We present a visually-grounded language understanding model based on a study of how people verbally describe objects in scenes. The emphasis of the model is on the combination of individual word meanings to produce meanings for complex…

Artificial Intelligence · Computer Science 2011-07-04 P. Gorniak , D. Roy

A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

Key to tasks that require reasoning about natural language in visual contexts is grounding words and phrases to image regions. However, observing this grounding in contemporary models is complex, even if it is generally expected to take…

Computation and Language · Computer Science 2024-06-03 Noriyuki Kojima , Hadar Averbuch-Elor , Yoav Artzi

Language learning using Speech to Image retrieval

Humans learn language by interaction with their environment and listening to other humans. It should also be possible for computational models to learn language directly from speech but so far most approaches require text. We improve on…

Computation and Language · Computer Science 2019-09-25 Danny Merkx , Stefan L. Frank , Mirjam Ernestus

Finding Structure in Language Models

When we speak, write or listen, we continuously make predictions based on our knowledge of a language's grammar. Remarkably, children acquire this grammatical knowledge within just a few years, enabling them to understand and generalise to…

Computation and Language · Computer Science 2024-11-26 Jaap Jumelet

Visual Grounding of Inter-lingual Word-Embeddings

Visual grounding of Language aims at enriching textual representations of language with multiple sources of visual knowledge such as images and videos. Although visual grounding is an area of intense research, inter-lingual aspects of…

Computation and Language · Computer Science 2022-11-22 Wafaa Mohammed , Hassan Shahmohammadi , Hendrik P. A. Lensch , R. Harald Baayen

Grounded Language Learning in a Simulated 3D World

We are increasingly surrounded by artificially intelligent technology that takes decisions and executes actions on our behalf. This creates a pressing need for general means to communicate with, instruct and guide artificial agents, with…

Computation and Language · Computer Science 2017-06-27 Karl Moritz Hermann , Felix Hill , Simon Green , Fumin Wang , Ryan Faulkner , Hubert Soyer , David Szepesvari , Wojciech Marian Czarnecki , Max Jaderberg , Denis Teplyashin , Marcus Wainwright , Chris Apps , Demis Hassabis , Phil Blunsom

Towards Visual Grounding: A Survey

Visual Grounding, also known as Referring Expression Comprehension and Phrase Grounding, aims to ground the specific region(s) within the image(s) based on the given expression text. This task simulates the common referential relationships…

Computer Vision and Pattern Recognition · Computer Science 2025-11-12 Linhui Xiao , Xiaoshan Yang , Xiangyuan Lan , Yaowei Wang , Changsheng Xu

Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling Approaches

People rely heavily on context to enrich meaning beyond what is literally said, enabling concise but effective communication. To interact successfully and naturally with people, user-facing artificial intelligence systems will require…

Computation and Language · Computer Science 2023-11-23 Daniel Fried , Nicholas Tomlin , Jennifer Hu , Roma Patel , Aida Nematzadeh

VLGrammar: Grounded Grammar Induction of Vision and Language

Cognitive grammar suggests that the acquisition of language grammar is grounded within visual structures. While grammar is an essential representation of natural language, it also exists ubiquitously in vision to represent the hierarchical…

Computer Vision and Pattern Recognition · Computer Science 2021-03-25 Yining Hong , Qing Li , Song-Chun Zhu , Siyuan Huang

Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning

In natural language processing, most models try to learn semantic representations merely from texts. The learned representations encode the distributional semantics but fail to connect to any knowledge about the physical world. In contrast,…

Computation and Language · Computer Science 2021-11-16 Yizhen Zhang , Minkyu Choi , Kuan Han , Zhongming Liu

Learning Semantic Parsers from Denotations with Latent Structured Alignments and Abstract Programs

Semantic parsing aims to map natural language utterances onto machine interpretable meaning representations, aka programs whose execution against a real-world environment produces a denotation. Weakly-supervised semantic parsers are trained…

Computation and Language · Computer Science 2019-09-11 Bailin Wang , Ivan Titov , Mirella Lapata

Temporal Grounding Graphs for Language Understanding with Accrued Visual-Linguistic Context

A robot's ability to understand or ground natural language instructions is fundamentally tied to its knowledge about the surrounding world. We present an approach to grounding natural language utterances in the context of factual…

Robotics · Computer Science 2018-11-19 Rohan Paul , Andrei Barbu , Sue Felshin , Boris Katz , Nicholas Roy