English
Related papers

Related papers: MIND: Multi-Scale Intent Diffusion for Text-Driven…

200 papers

Humanoid agents are expected to emulate the complex coordination inherent in human social behaviors. However, existing methods are largely confined to single-agent scenarios, overlooking the physically plausible interplay essential for…

Computer Vision and Pattern Recognition · Computer Science 2025-12-15 Bin Li , Ruichi Zhang , Han Liang , Jingyan Zhang , Juze Zhang , Xin Chen , Lan Xu , Jingyi Yu , Jingya Wang

Controlling physics-based humanoids from natural-language instructions is a critical step toward general-purpose embodied agents. However, existing methods remain constrained by a tension between semantic expressiveness and physical…

Graphics · Computer Science 2026-05-26 Jingyan Zhang , Han Liang , Ruichi Zhang , Bin Li , Juze Zhang , Xin Chen , Jingya Wang , Lan Xu , Jingyi Yu

Text-driven multi-human motion generation with complex interactions remains a challenging problem. Despite progress in performance, existing offline methods that generate fixed-length motions with a fixed number of agents, are inherently…

Computer Vision and Pattern Recognition · Computer Science 2026-01-29 Mengge Liu , Yan Di , Gu Wang , Yun Qu , Dekai Zhu , Yanyan Li , Xiangyang Ji

Text-driven human motion generation is a multimodal task that synthesizes human motion sequences conditioned on natural language. It requires the model to satisfy textual descriptions under varying conditional inputs, while generating…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Xingyu Chen

Multimodal fusion leverages information across modalities to learn better feature representations with the goal of improving performance in fusion-based tasks. However, multimodal datasets, especially in medical settings, are typically…

Machine Learning · Computer Science 2025-02-05 Alejandro Guerra-Manzanares , Farah E. Shamout

The human-like form of humanoid robots positions them uniquely to achieve the agility and versatility in motor skills that humans possess. Learning from human demonstrations offers a scalable approach to acquiring these capabilities.…

Robotics · Computer Science 2025-11-14 Qiayuan Liao , Takara E. Truong , Xiaoyu Huang , Yuman Gao , Guy Tevet , Koushil Sreenath , C. Karen Liu

Our goal is to generate realistic human motion from natural language. Modern methods often face a trade-off between model expressiveness and text-to-motion alignment. Some align text and motion latent spaces but sacrifice expressiveness;…

Computer Vision and Pattern Recognition · Computer Science 2024-10-21 Nefeli Andreou , Xi Wang , Victoria Fernández Abrevaya , Marie-Paule Cani , Yiorgos Chrysanthou , Vicky Kalogeiton

Traditional control and planning for robotic manipulation heavily rely on precise physical models and predefined action sequences. While effective in structured environments, such approaches often fail in real-world scenarios due to…

Robotics · Computer Science 2025-08-08 Jin Wang , Weijie Wang , Boyuan Deng , Heng Zhang , Rui Dai , Nikos Tsagarakis

Effective human-robot interaction requires robots to identify human intentions and generate expressive, socially appropriate motions in real-time. Existing approaches often rely on fixed motion libraries or computationally expensive…

Robotics · Computer Science 2025-09-30 Lingfan Bao , Yan Pan , Tianhu Peng , Dimitrios Kanoulas , Chengxu Zhou

Existing humanoid control systems often rely on teleoperation or modular generation pipelines that separate language understanding from physical execution. However, the former is entirely human-driven, and the latter lacks tight alignment…

Robotics · Computer Science 2025-11-25 Yuxuan Wang , Haobin Jiang , Shiqing Yao , Ziluo Ding , Zongqing Lu

We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors. Since human motions are highly diverse…

Computer Vision and Pattern Recognition · Computer Science 2023-05-22 Xin Chen , Biao Jiang , Wen Liu , Zilong Huang , Bin Fu , Tao Chen , Jingyi Yu , Gang Yu

While imitation learning (IL) has achieved impressive success in dexterous manipulation through generative modeling and pretraining, state-of-the-art approaches like Vision-Language-Action (VLA) models still struggle with adaptation to…

Robotics · Computer Science 2026-03-31 Renming Huang , Chendong Zeng , Wenjing Tang , Jintian Cai , Cewu Lu , Panpan Cai

General-purpose humanoid robots are expected to interact intuitively with humans, enabling seamless integration into daily life. Natural language provides the most accessible medium for this purpose. However, translating language into…

Recent advances in AI-generated content (AIGC) have significantly accelerated image editing techniques, driving increasing demand for diverse and fine-grained edits. Despite these advances, existing image editing methods still face…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Shuyu Wang , Weiqi Li , Qian Wang , Shijie Zhao , Jian Zhang

Synthesizing natural human motion that adapts to complex environments while allowing creative control remains a fundamental challenge in motion synthesis. Existing models often fall short, either by assuming flat terrain or lacking the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-23 Xiaohan Zhang , Sebastian Starke , Vladimir Guzov , Zhensong Zhang , Eduardo Pérez Pellitero , Gerard Pons-Moll

Scalable embodied intelligence is constrained by the scarcity of diverse, long-horizon robotic manipulation data. Existing video world models in this domain are limited to synthesizing short clips of simple actions and often rely on…

Multimodal Stance Detection (MSD) is a crucial task for understanding public opinion on social media. Existing methods predominantly operate by learning to fuse modalities. They lack an explicit reasoning process to discern how inter-modal…

Computation and Language · Computer Science 2026-01-06 Bingbing Wang , Zhengda Jin , Bin Liang , Wenjie Li , Jing Li , Ruifeng Xu , Min Zhang

Understanding human intent in complex multi-turn interactions remains a fundamental challenge in human-computer interaction and behavioral analysis. While existing intent recognition datasets focus mainly on single utterances or simple…

Artificial Intelligence · Computer Science 2026-04-15 Shufang Lin , Muyang Chen , Xiabing Zhou , Rongrong Zhang , Dayou Zhang , Fangxin Wang

Text-driven person image generation is an emerging and challenging task in cross-modality image generation. Controllable person image generation promotes a wide range of applications such as digital human interaction and virtual try-on.…

Computer Vision and Pattern Recognition · Computer Science 2022-11-14 Kaiduo Zhang , Muyi Sun , Jianxin Sun , Binghao Zhao , Kunbo Zhang , Zhenan Sun , Tieniu Tan

The ability of human beings to precisely recog- nize others intents is a significant mental activity in reasoning about actions, such as, what other people are doing and what they will do next. Recent research has revealed that human…

Human-Computer Interaction · Computer Science 2018-03-13 Xiang Zhang
‹ Prev 1 2 3 10 Next ›