Related papers: Visual Perspective Taking for Opponent Behavior Mo…
Visual perspective-taking (VPT), the ability to understand the viewpoint of another person, enables individuals to anticipate the actions of other people. For instance, a driver can avoid accidents by assessing what pedestrians see. Humans…
Perspective-taking is the ability to perceive or understand a situation or concept from another individual's point of view, and is crucial in daily human interactions. Enabling robots to perform perspective-taking remains an unsolved…
Prospection, the act of predicting the consequences of many possible futures, is intrinsic to human planning and action, and may even be at the root of consciousness. Surprisingly, this idea has been explored comparatively little in…
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works, and then use this learned model to plan coordinated sequences of actions to bring about desired outcomes.…
We propose to learn tasks directly from visual demonstrations by learning to predict the outcome of human and robot actions on an environment. We enable a robot to physically perform a human demonstrated task without knowledge of the…
In order to autonomously learn wide repertoires of complex skills, robots must be able to learn from their own autonomously collected data, without human supervision. One learning signal that is always available for autonomously collected…
Understanding human perceptions of robot performance is crucial for designing socially intelligent robots that can adapt to human expectations. Current approaches often rely on surveys, which can disrupt ongoing human-robot interactions. As…
Perspective taking is the ability to take the point of view of another agent. This skill is not unique to humans as it is also displayed by other animals like chimpanzees. It is an essential ability for social interactions, including…
The object perception capabilities of humans are impressive, and this becomes even more evident when trying to develop solutions with a similar proficiency in autonomous robots. While there have been notable advancements in the technologies…
Perspective taking, which allows people to imagine another's thinking and goals, is known to be an effective method for promoting prosocial behaviors in human-computer interactions. However, most of the previous studies have focused on…
Humans learn from observations and experiences to adjust their behaviours towards better performance. Interacting with such dynamic humans is challenging, as the robot needs to predict the humans accurately for safe and efficient…
Visual representations play a crucial role in developing generalist robotic policies. Previous vision encoders, typically pre-trained with single-image reconstruction or two-image contrastive learning, tend to capture static information,…
Humans make extensive use of vision and touch as complementary senses, with vision providing global information about the scene and touch measuring local information during manipulation without suffering from occlusions. While prior work…
As multimodal language models (MLMs) are increasingly used in social and collaborative settings, it is crucial to evaluate their perspective-taking abilities. Existing benchmarks largely rely on text-based vignettes or static scene…
The emergence of vision catalysed a pivotal evolutionary advancement, enabling organisms not only to perceive but also to interact intelligently with their environment. This transformation is mirrored by the evolution of robotic systems,…
Visual pre-training with large-scale real-world data has made great progress in recent years, showing great potential in robot learning with pixel observations. However, the recipes of visual pre-training for robot manipulation tasks are…
Accurately predicting human behaviors is crucial for mobile robots operating in human-populated environments. While prior research primarily focuses on predicting actions in single-human scenarios from an egocentric view, several robotic…
Visual perspective taking (VPT) is the ability to perceive and reason about the perspectives of others. It is an essential feature of human intelligence, which develops over the first decade of life and requires an ability to process the 3D…
Learning visual representations from observing actions to benefit robot visuo-motor policy generation is a promising direction that closely resembles human cognitive function and perception. Motivated by this, and further inspired by…
In complex environments, where the human sensory system reaches its limits, our behaviour is strongly driven by our beliefs about the state of the world around us. Accessing others' beliefs, intentions, or mental states in general, could…