English
Related papers

Related papers: VQA-based Robotic State Recognition Optimized with…

200 papers

Recognition of the current state is indispensable for the operation of a robot. There are various states to be recognized, such as whether an elevator door is open or closed, whether an object has been grasped correctly, and whether the TV…

Robotics · Computer Science 2023-10-26 Kento Kawaharazuka , Yoshiki Obinata , Naoaki Kanazawa , Kei Okada , Masayuki Inaba

In order for robots to autonomously navigate and operate in diverse environments, it is essential for them to recognize the state of their environment. On the other hand, the environmental state recognition has traditionally involved…

Robotics · Computer Science 2024-09-27 Kento Kawaharazuka , Yoshiki Obinata , Naoaki Kanazawa , Kei Okada , Masayuki Inaba

State recognition of the environment and objects, such as the open/closed state of doors and the on/off of lights, is indispensable for robots that perform daily life support and security tasks. Until now, state recognition methods have…

Robotics · Computer Science 2024-10-31 Kento Kawaharazuka , Yoshiki Obinata , Naoaki Kanazawa , Kei Okada , Masayuki Inaba

To ensure proper knowledge representation of the kitchen environment, it is vital for kitchen robots to recognize the states of the food items that are being cooked. Although the domain of object detection and recognition has been…

Computer Vision and Pattern Recognition · Computer Science 2023-03-07 Akib Mohammed Khan , Alif Ashrafee , Reeshoon Sayera , Shahriar Ivan , Sabbir Ahmed

Cooking tasks are characterized by large changes in the state of the food, which is one of the major challenges in robot execution of cooking tasks. In particular, cooking using a stove to apply heat to the foodstuff causes many special…

Robotics · Computer Science 2023-09-07 Naoaki Kanazawa , Kento Kawaharazuka , Yoshiki Obinata , Kei Okada , Masayuki Inaba

The state recognition of the environment and objects by robots is generally based on the judgement of the current state as a classification problem. On the other hand, state changes of food in cooking happen continuously and need to be…

Robotics · Computer Science 2024-03-19 Kento Kawaharazuka , Naoaki Kanazawa , Yoshiki Obinata , Kei Okada , Masayuki Inaba

In machine learning, it is very important for a robot to know the state of an object and recognize particular desired states. This is an image classification problem that can be solved using a convolutional neural network. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2019-04-30 Kyle Mott

Visual question answering (VQA) usesimage processing algorithms to process the image and natural language processing methods to understand and answer the question. VQA is helpful to a visually impaired person, can be used for the security…

Computer Vision and Pattern Recognition · Computer Science 2023-05-31 Param Ahir , Hiteishi M. Diwanji

One of the most intriguing features of the Visual Question Answering (VQA) challenge is the unpredictability of the questions. Extracting the information required to answer them demands a variety of image operations from detection and…

Computer Vision and Pattern Recognition · Computer Science 2016-12-19 Peng Wang , Qi Wu , Chunhua Shen , Anton van den Hengel

Automating garment manipulation poses a significant challenge for assistive robotics due to the diverse and deformable nature of garments. Traditional approaches typically require separate models for each garment type, which limits…

Robotics · Computer Science 2024-10-08 Xin Li , Siyuan Huang , Qiaojun Yu , Zhengkai Jiang , Ce Hao , Yimeng Zhu , Hongsheng Li , Peng Gao , Cewu Lu

In recent years, a number of models that learn the relations between vision and language from large datasets have been released. These models perform a variety of tasks, such as answering questions about images, retrieving sentences that…

Robotics · Computer Science 2024-03-19 Kento Kawaharazuka , Yoshiki Obinata , Naoaki Kanazawa , Kei Okada , Masayuki Inaba

Visual Question Answering (VQA) presents a unique challenge as it requires the ability to understand and encode the multi-modal inputs - in terms of image processing and natural language processing. The algorithm further needs to learn how…

Computer Vision and Pattern Recognition · Computer Science 2017-09-26 Supriya Pandhre , Shagun Sodhani

We introduce the Neural State Machine, seeking to bridge the gap between the neural and symbolic views of AI and integrate their complementary strengths for the task of visual reasoning. Given an image, we first predict a probabilistic…

Artificial Intelligence · Computer Science 2019-11-26 Drew A. Hudson , Christopher D. Manning

This paper revisits visual representation in knowledge-based visual question answering (VQA) and demonstrates that using regional information in a better way can significantly improve the performance. While visual representation is…

Computer Vision and Pattern Recognition · Computer Science 2022-10-11 Yuanze Lin , Yujia Xie , Dongdong Chen , Yichong Xu , Chenguang Zhu , Lu Yuan

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Qi Wu , Damien Teney , Peng Wang , Chunhua Shen , Anthony Dick , Anton van den Hengel

Visual Question Answering (VQA) is an evolving research field aimed at enabling machines to answer questions about visual content by integrating image and language processing techniques such as feature extraction, object detection, text…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Ngoc Dung Huynh , Mohamed Reda Bouadjenek , Sunil Aryal , Imran Razzak , Hakim Hacid

A hierarchical cross-modal fusion model is proposed for vision-language question answering (VLQA) in industrial robotics, targeting the challenges of semantic ambiguity, complex environmental layouts, and domain-specific terminology common…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Ping Li , Bartlomiej Brzozka

Visual Grounding (VG) in Visual Question Answering (VQA) systems describes how well a system manages to tie a question and its answer to relevant image regions. Systems with strong VG are considered intuitively interpretable and suggest an…

Computer Vision and Pattern Recognition · Computer Science 2022-11-16 Daniel Reich , Felix Putze , Tanja Schultz

Visual question answering (VQA) demands simultaneous comprehension of both the image visual content and natural language questions. In some cases, the reasoning needs the help of common sense or general knowledge which usually appear in the…

Computer Vision and Pattern Recognition · Computer Science 2018-11-30 Hui Li , Peng Wang , Chunhua Shen , Anton van den Hengel

Visual Question Answering (VQA) is an interdisciplinary field that bridges the gap between computer vision (CV) and natural language processing(NLP), enabling Artificial Intelligence(AI) systems to answer questions about images. Since its…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Anupam Pandey , Deepjyoti Bodo , Arpan Phukan , Asif Ekbal
‹ Prev 1 2 3 10 Next ›