Related papers: CoMo: Controllable Motion Generation through Langu…

CoMo: Compositional Motion Customization for Text-to-Video Generation

While recent text-to-video models excel at generating diverse scenes, they struggle with precise motion control, particularly for complex, multi-subject motions. Although methods for single-motion customization have been developed to…

Computer Vision and Pattern Recognition · Computer Science 2025-10-28 Youcan Xu , Zhen Wang , Jiaxin Shi , Kexin Li , Feifei Shao , Jun Xiao , Yi Yang , Jun Yu , Long Chen

CoMA: Compositional Human Motion Generation with Multi-modal Agents

3D human motion generation has seen substantial advancement in recent years. While state-of-the-art approaches have improved performance significantly, they still struggle with complex and detailed motions unseen in training data, largely…

Computer Vision and Pattern Recognition · Computer Science 2025-01-09 Shanlin Sun , Gabriel De Araujo , Jiaqi Xu , Shenghan Zhou , Hanwen Zhang , Ziheng Huang , Chenyu You , Xiaohui Xie

Pose-Guided Residual Refinement for Interpretable Text-to-Motion Generation and Editing

Text-based 3D motion generation aims to automatically synthesize diverse motions from natural-language descriptions to extend user creativity, whereas motion editing modifies an existing motion sequence in response to text while preserving…

Computer Vision and Pattern Recognition · Computer Science 2025-12-30 Sukhyun Jeong , Yong-Hoon Choi

Motion-example-controlled Co-speech Gesture Generation Leveraging Large Language Models

The automatic generation of controllable co-speech gestures has recently gained growing attention. While existing systems typically achieve gesture control through predefined categorical labels or implicit pseudo-labels derived from motion…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Bohong Chen , Yumeng Li , Youyi Zheng , Yao-Xiang Ding , Kun Zhou

SMooGPT: Stylized Motion Generation using Large Language Models

Stylized motion generation is actively studied in computer graphics, especially benefiting from the rapid advances in diffusion models. The goal of this task is to produce a novel motion respecting both the motion content and the desired…

Graphics · Computer Science 2026-01-27 Lei Zhong , Yi Yang , Changjian Li

Making Pose Representations More Expressive and Disentangled via Residual Vector Quantization

Recent progress in text-to-motion has advanced both 3D human motion generation and text-based motion control. Controllable motion generation (CoMo), which enables intuitive control, typically relies on pose code representations, but…

Computer Vision and Pattern Recognition · Computer Science 2025-08-21 Sukhyun Jeong , Hong-Gi Shin , Yong-Hoon Choi

ACMo: Attribute Controllable Motion Generation

Attributes such as style, fine-grained text, and trajectory are specific conditions for describing motion. However, existing methods often lack precise user control over motion attributes and suffer from limited generalizability to unseen…

Computer Vision and Pattern Recognition · Computer Science 2025-03-17 Mingjie Wei , Xuemei Xie , Guangming Shi

LLaMo: Scaling Pretrained Language Models for Unified Motion Understanding and Generation with Continuous Autoregressive Tokens

Recent progress in large models has led to significant advances in unified multimodal generation and understanding. However, the development of models that unify motion-language generation and understanding remains largely underexplored.…

Computer Vision and Pattern Recognition · Computer Science 2026-04-20 Zekun Li , Sizhe An , Chengcheng Tang , Chuan Guo , Ivan Shugurov , Linguang Zhang , Amy Zhao , Srinath Sridhar , Lingling Tao , Abhay Mittal

UniMo: Unified Motion Generation and Understanding with Chain of Thought

Existing 3D human motion generation and understanding methods often exhibit limited interpretability, restricting effective mutual enhancement between these inherently related tasks. While current unified frameworks based on large language…

Artificial Intelligence · Computer Science 2026-01-21 Guocun Wang , Kenkun Liu , Jing Lin , Guorui Song , Jian Li , Xiaoguang Han

CoT-Pose: Chain-of-Thought Reasoning for 3D Pose Generation from Abstract Prompts

Recent advances in multi-modal large language models (MLLMs) and chain-of-thought (CoT) reasoning have led to significant progress in image and text generation tasks. However, the field of 3D human pose generation still faces critical…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Junuk Cha , Jihyeon Kim

Progressive Human Motion Generation Based on Text and Few Motion Frames

Although existing text-to-motion (T2M) methods can produce realistic human motion from text description, it is still difficult to align the generated motion with the desired postures since using text alone is insufficient for precisely…

Computer Vision and Pattern Recognition · Computer Science 2025-04-03 Ling-An Zeng , Gaojie Wu , Ancong Wu , Jian-Fang Hu , Wei-Shi Zheng

MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding

Generating lifelike human motions from descriptive texts has experienced remarkable research focus in the recent years, propelled by the emerging requirements of digital humans.Despite impressive advances, existing approaches are often…

Computer Vision and Pattern Recognition · Computer Science 2024-10-30 Yuan Wang , Di Huang , Yaqi Zhang , Wanli Ouyang , Jile Jiao , Xuetao Feng , Yan Zhou , Pengfei Wan , Shixiang Tang , Dan Xu

Fleximo: Towards Flexible Text-to-Human Motion Video Generation

Current methods for generating human motion videos rely on extracting pose sequences from reference videos, which restricts flexibility and control. Additionally, due to the limitations of pose detection techniques, the extracted pose…

Computer Vision and Pattern Recognition · Computer Science 2024-12-02 Yuhang Zhang , Yuan Zhou , Zeyu Liu , Yuxuan Cai , Qiuyue Wang , Aidong Men , Huan Yang

Plan, Posture and Go: Towards Open-World Text-to-Motion Generation

Conventional text-to-motion generation methods are usually trained on limited text-motion pairs, making them hard to generalize to open-world scenarios. Some works use the CLIP model to align the motion space and the text space, aiming to…

Computer Vision and Pattern Recognition · Computer Science 2023-12-25 Jinpeng Liu , Wenxun Dai , Chunyu Wang , Yiji Cheng , Yansong Tang , Xin Tong

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training

In text-to-motion generation, controllability as well as generation quality and speed has become increasingly critical. The controllability challenges include generating a motion of a length that matches the given textual description and…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Kengo Uchida , Takashi Shibuya , Yuhta Takida , Naoki Murata , Julian Tanke , Shusuke Takahashi , Yuki Mitsufuji

Controllable Text-to-Motion Generation via Modular Body-Part Phase Control

Text-to-motion (T2M) generation is becoming a practical tool for animation and interactive avatars. However, modifying specific body parts while maintaining overall motion coherence remains challenging. Existing methods typically rely on…

Computer Vision and Pattern Recognition · Computer Science 2026-03-23 Minyue Dai , Ke Fan , Anyi Rao , Jingbo Wang , Bo Dai

RoMo: A Large-Scale, Richly Organized Dataset and Semantic Taxonomy for Human Motion Generation

Success in generative modeling across language, image, and video demonstrates that large, well-curated datasets are the key driver for building capable models. 3D Human motion, however, has lagged behind, constrained by an unsatisfying…

Computer Vision and Pattern Recognition · Computer Science 2026-05-27 Jiahao Zhang , Joseph Liu , Young-Yoon Lee , Seonghyeon Moon , Victor Zordan , Guy Tevet , Karen Liu , Stephen Gould , Oren Jacob , Haomiao Jiang , Mubbasir Kapadia , Yizhak Ben-Shabat

Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning

Despite advancements in Text-to-Video (T2V) generation, producing videos with realistic motion remains challenging. Current models often yield static or minimally dynamic outputs, failing to capture complex motions described by text. This…

Computer Vision and Pattern Recognition · Computer Science 2024-11-01 Penghui Ruan , Pichao Wang , Divya Saxena , Jiannong Cao , Yuhui Shi

SegMo: Segment-aligned Text to 3D Human Motion Generation

Generating 3D human motions from textual descriptions is an important research problem with broad applications in video games, virtual reality, and augmented reality. Recent methods align the textual description with human motion at the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-25 Bowen Dang , Lin Wu , Xiaohang Yang , Zheng Yuan , Zhixiang Chen

A Self-supervised Motion Representation for Portrait Video Generation

Recent advancements in portrait video generation have been noteworthy. However, existing methods rely heavily on human priors and pre-trained generative models, Motion representations based on human priors may introduce unrealistic motion,…

Computer Vision and Pattern Recognition · Computer Science 2025-06-16 Qiyuan Zhang , Chenyu Wu , Wenzhang Sun , Huaize Liu , Donglin Di , Wei Chen , Changqing Zou