Related papers: VQA-based Robotic State Recognition Optimized with…

Binary State Recognition by Robots using Visual Question Answering of Pre-Trained Vision-Language Model

Recognition of the current state is indispensable for the operation of a robot. There are various states to be recognized, such as whether an elevator door is open or closed, whether an object has been grasped correctly, and whether the TV…

Robotics · Computer Science 2023-10-26 Kento Kawaharazuka , Yoshiki Obinata , Naoaki Kanazawa , Kei Okada , Masayuki Inaba

Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization

In order for robots to autonomously navigate and operate in diverse environments, it is essential for them to recognize the state of their environment. On the other hand, the environmental state recognition has traditionally involved…

Robotics · Computer Science 2024-09-27 Kento Kawaharazuka , Yoshiki Obinata , Naoaki Kanazawa , Kei Okada , Masayuki Inaba

Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization

State recognition of the environment and objects, such as the open/closed state of doors and the on/off of lights, is indispensable for robots that perform daily life support and security tasks. Until now, state recognition methods have…

Robotics · Computer Science 2024-10-31 Kento Kawaharazuka , Yoshiki Obinata , Naoaki Kanazawa , Kei Okada , Masayuki Inaba

Rethinking Cooking State Recognition with Vision Transformers

To ensure proper knowledge representation of the kitchen environment, it is vital for kitchen robots to recognize the states of the food items that are being cooked. Although the domain of object detection and recognition has been…

Computer Vision and Pattern Recognition · Computer Science 2023-03-07 Akib Mohammed Khan , Alif Ashrafee , Reeshoon Sayera , Shahriar Ivan , Sabbir Ahmed

Recognition of Heat-Induced Food State Changes by Time-Series Use of Vision-Language Model for Cooking Robot

Cooking tasks are characterized by large changes in the state of the food, which is one of the major challenges in robot execution of cooking tasks. In particular, cooking using a stove to apply heat to the foodstuff causes many special…

Robotics · Computer Science 2023-09-07 Naoaki Kanazawa , Kento Kawaharazuka , Yoshiki Obinata , Kei Okada , Masayuki Inaba

Continuous Object State Recognition for Cooking Robots Using Pre-Trained Vision-Language Models and Black-box Optimization

The state recognition of the environment and objects by robots is generally based on the judgement of the current state as a classification problem. On the other hand, state changes of food in cooking happen continuously and need to be…

Robotics · Computer Science 2024-03-19 Kento Kawaharazuka , Naoaki Kanazawa , Yoshiki Obinata , Kei Okada , Masayuki Inaba

State Classification of Cooking Objects Using a VGG CNN

In machine learning, it is very important for a robot to know the state of an object and recognize particular desired states. This is an image classification problem that can be solved using a convolutional neural network. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2019-04-30 Kyle Mott

Analysis of Visual Question Answering Algorithms with attention model

Visual question answering (VQA) usesimage processing algorithms to process the image and natural language processing methods to understand and answer the question. VQA is helpful to a visually impaired person, can be used for the security…

Computer Vision and Pattern Recognition · Computer Science 2023-05-31 Param Ahir , Hiteishi M. Diwanji

The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions

One of the most intriguing features of the Visual Question Answering (VQA) challenge is the unpredictability of the questions. Extracting the information required to answer them demands a variety of image operations from detection and…

Computer Vision and Pattern Recognition · Computer Science 2016-12-19 Peng Wang , Qi Wu , Chunhua Shen , Anton van den Hengel

SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation

Automating garment manipulation poses a significant challenge for assistive robotics due to the diverse and deformable nature of garments. Traditional approaches typically require separate models for each garment type, which limits…

Robotics · Computer Science 2024-10-08 Xin Li , Siyuan Huang , Qiaojun Yu , Zhengkai Jiang , Ce Hao , Yimeng Zhu , Hongsheng Li , Peng Gao , Cewu Lu

Robotic Applications of Pre-Trained Vision-Language Models to Various Recognition Behaviors

In recent years, a number of models that learn the relations between vision and language from large datasets have been released. These models perform a variety of tasks, such as answering questions about images, retrieving sentences that…

Robotics · Computer Science 2024-03-19 Kento Kawaharazuka , Yoshiki Obinata , Naoaki Kanazawa , Kei Okada , Masayuki Inaba

Survey of Recent Advances in Visual Question Answering

Visual Question Answering (VQA) presents a unique challenge as it requires the ability to understand and encode the multi-modal inputs - in terms of image processing and natural language processing. The algorithm further needs to learn how…

Computer Vision and Pattern Recognition · Computer Science 2017-09-26 Supriya Pandhre , Shagun Sodhani

Learning by Abstraction: The Neural State Machine

We introduce the Neural State Machine, seeking to bridge the gap between the neural and symbolic views of AI and integrate their complementary strengths for the task of visual reasoning. Given an image, we first predict a probabilistic…

Artificial Intelligence · Computer Science 2019-11-26 Drew A. Hudson , Christopher D. Manning

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

This paper revisits visual representation in knowledge-based visual question answering (VQA) and demonstrates that using regional information in a better way can significantly improve the performance. While visual representation is…

Computer Vision and Pattern Recognition · Computer Science 2022-10-11 Yuanze Lin , Yujia Xie , Dongdong Chen , Yichong Xu , Chenguang Zhu , Lu Yuan

Visual Question Answering: A Survey of Methods and Datasets

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Qi Wu , Damien Teney , Peng Wang , Chunhua Shen , Anthony Dick , Anton van den Hengel

Visual question answering: from early developments to recent advances -- a survey

Visual Question Answering (VQA) is an evolving research field aimed at enabling machines to answer questions about visual content by integrating image and language processing techniques such as feature extraction, object detection, text…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Ngoc Dung Huynh , Mohamed Reda Bouadjenek , Sunil Aryal , Imran Razzak , Hakim Hacid

Research on Vision-Language Question Answering Models for Industrial Robots

A hierarchical cross-modal fusion model is proposed for vision-language question answering (VLQA) in industrial robotics, targeting the challenges of semantic ambiguity, complex environmental layouts, and domain-specific terminology common…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Ping Li , Bartlomiej Brzozka

Visually Grounded VQA by Lattice-based Retrieval

Visual Grounding (VG) in Visual Question Answering (VQA) systems describes how well a system manages to tie a question and its answer to relevant image regions. Systems with strong VG are considered intuitively interpretable and suggest an…

Computer Vision and Pattern Recognition · Computer Science 2022-11-16 Daniel Reich , Felix Putze , Tanja Schultz

Visual Question Answering as Reading Comprehension

Visual question answering (VQA) demands simultaneous comprehension of both the image visual content and natural language questions. In some cases, the reasoning needs the help of common sense or general knowledge which usually appear in the…

Computer Vision and Pattern Recognition · Computer Science 2018-11-30 Hui Li , Peng Wang , Chunhua Shen , Anton van den Hengel

The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering

Visual Question Answering (VQA) is an interdisciplinary field that bridges the gap between computer vision (CV) and natural language processing(NLP), enabling Artificial Intelligence(AI) systems to answer questions about images. Since its…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Anupam Pandey , Deepjyoti Bodo , Arpan Phukan , Asif Ekbal