Jiaming Liu — Scifaro

CollectionLoRA: Collecting 50 Effects in 1 LoRA via Multi-Teacher On-Policy Distillation

Customized image editing aims to equip pre-trained diffusion models with specific visual effects using limited paired data, typically via Low-Rank Adaptation (LoRA). As the number of desired effects grows, storing and dynamically loading…

Computer Vision and Pattern Recognition · Computer Science 2026-05-28 Fangtai Wu , Hailong Guo , Shijie Huang , Jiayi Song , Yubo Huang , Mushui Liu , Zhao Wang , Yunlong Yu , Jiaming Liu , Ruihua Huang

TwinRL: Digital Twin-Driven Reinforcement Learning for Real-World Robotic Manipulation

Despite strong generalization capabilities, Vision-Language-Action (VLA) models remain constrained by the high cost of expert demonstrations and limited real-world interaction. While online reinforcement learning (RL) has shown promise, its…

Robotics · Computer Science 2026-05-20 Qinwen Xu , Jiaming Liu , Rui Zhou , Shaojun Shi , Nuowei Han , Zhuoyang Liu , Chenyang Gu , Shuo Gu , Yang Yue , Gao Huang , Wenzhao Zheng , Sirui Han , Peng Jia , Shanghang Zhang

MoASE++: Mixture of Activation Sparsity Experts with Domain-Adaptive On-policy Distillation for Continual Test Time Adaptation

Continual test-time adaptation adapts a source-pretrained model to non-stationary, unlabeled target streams while retaining past competence, yet texture-biased backbones risk error accumulation and catastrophic forgetting. Drawing…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Ronyu Zhang , Aosong Cheng , Gaole Dai , Yulin Luo , Jiaming Liu , Li Du , Huanrui Yang , Dan Wang , Leyuan Fang , Yuan Du , Shanghang Zhang

HarmoWAM: Harmonizing Generalizable and Precise Manipulation via Adaptive World Action Models

World Action Models (WAMs) have emerged as a promising paradigm for robot control by modeling physical dynamics. Current WAMs generally follow two paradigms: the "Imagine-then-Execute" approach, which uses video prediction to infer actions…

Robotics · Computer Science 2026-05-12 Qiuxuan Feng , Jiale Yu , Jiaming Liu , Yueru Jia , Zhuangzhe Wu , Hao Chen , Zezhong Qian , Shuo Gu , Peng Jia , Siwei Ma , Shanghang Zhang

LaST-R1: Reinforcing Robotic Manipulation via Adaptive Physical Latent Reasoning

Robotic foundation models require reasoning over complex visual scenes to execute adaptive actions in dynamic environments. While recent studies on latent-reasoning Vision-Language-Action (VLA) models have demonstrated the capability to…

Robotics · Computer Science 2026-05-08 Hao Chen , Jiaming Liu , Zhonghao Yan , Nuowei Han , Renrui Zhang , Chenyang Gu , Jialin Gao , Ziyu Guo , Siyuan Qian , Yinxi Wang , Peng Jia , Shanghang Zhang , Pheng-Ann Heng

Hi-WM: Human-in-the-World-Model for Scalable Robot Post-Training

Post-training is essential for turning pretrained generalist robot policies into reliable task-specific controllers, but existing human-in-the-loop pipelines remain tied to physical execution: each correction requires robot time, scene…

Robotics · Computer Science 2026-05-06 Yaxuan Li , Zhongyi Zhou , Yefei Chen , Yanjiang Guo , Jiaming Liu , Shanghang Zhang , Jianyu Chen , Yichen Zhu

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Audio-driven avatar interaction demands real-time, streaming, and infinite-length generation -- capabilities fundamentally at odds with the sequential denoising and long-horizon drift of current diffusion models. We present Live Avatar, an…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Yubo Huang , Hailong Guo , Fangtai Wu , Weiqiang Wang , Shifeng Zhang , Shijie Huang , Qijun Gan , Lin Liu , Sirui Zhao , Enhong Chen , Jiaming Liu , Steven Hoi

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

Most reward models for visual generation reduce rich human judgments to a single unexplained score, discarding the reasoning that underlies preference. We show that teaching reward models to produce explicit, multi-dimensional critiques…

Artificial Intelligence · Computer Science 2026-04-15 Haozhe Wang , Cong Wei , Weiming Ren , Jiaming Liu , Fangzhen Lin , Wenhu Chen

A Reasoning-Enabled Vision-Language Foundation Model for Chest X-ray Interpretation

Chest X-rays (CXRs) are among the most frequently performed imaging examinations worldwide, yet rising imaging volumes increase radiologist workload and the risk of diagnostic errors. Although artificial intelligence (AI) systems have shown…

Computer Vision and Pattern Recognition · Computer Science 2026-04-02 Yabin Zhang , Chong Wang , Yunhe Gao , Jiaming Liu , Maya Varma , Justin Xu , Sophie Ostmeier , Jin Long , Sergios Gatidis , Seena Dehkharghani , Arne Michalson , Eun Kyoung Hong , Christian Bluethgen , Haiwei Henry Guo , Alexander Victor Ortiz , Stephan Altmayer , Sandhya Bodapati , Joseph David Janizek , Ken Chang , Jean-Benoit Delbrouck , Akshay S. Chaudhari , Curtis P. Langlotz

The Geometry of Compromise: Unlocking Generative Capabilities via Controllable Modality Alignment

Vision-Language Models (VLMs) such as CLIP learn a shared embedding space for images and text, yet their representations remain geometrically separated, a phenomenon known as the modality gap. This gap limits tasks requiring cross-modal…

Computer Vision and Pattern Recognition · Computer Science 2026-04-02 Hongyuan Liu , Qinli Yang , Wen Li , Zhong Zhang , Jiaming Liu , Wei Han , Zhili Qin , Jinxia Guo , Junming Shao

LaST$_{0}$: Latent Spatio-Temporal Chain-of-Thought for Robotic Vision-Language-Action Model

Vision-Language-Action (VLA) models have recently shown strong generalization, with some approaches seeking to explicitly generate linguistic reasoning traces or predict future observations prior to execution. However, explicit reasoning…

Robotics · Computer Science 2026-03-31 Zhuoyang Liu , Jiaming Liu , Hao Chen , Jiale Yu , Ziyu Guo , Chengkai Hou , Chenyang Gu , Xiangju Mi , Renrui Zhang , Kun Wu , Zhengping Che , Jian Tang , Pheng-Ann Heng , Shanghang Zhang

Activation Matters: Test-time Activated Negative Labels for OOD Detection with Vision-Language Models

Out-of-distribution (OOD) detection aims to identify samples that deviate from in-distribution (ID). One popular pipeline addresses this by introducing negative labels distant from ID classes and detecting OOD based on their distance to…

Computer Vision and Pattern Recognition · Computer Science 2026-03-27 Yabin Zhang , Maya Varma , Yunhe Gao , Jean-Benoit Delbrouck , Jiaming Liu , Chong Wang , Curtis Langlotz

Video2Act: A Dual-System Video Diffusion Policy with Robotic Spatio-Motional Modeling

Robust perception and dynamics modeling are fundamental to real-world robotic policy learning. Recent methods employ video diffusion models (VDMs) to enhance robotic policies, improving their understanding and modeling of the physical…

Robotics · Computer Science 2026-03-25 Yueru Jia , Jiaming Liu , Shengbang Liu , Rui Zhou , Wanhe Yu , Yuyang Yan , Xiaowei Chi , Yandong Guo , Boxin Shi , Shanghang Zhang

MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation

Vision-language-action models (VLAs) have shown generalization capabilities in robotic manipulation tasks by inheriting from vision-language models (VLMs) and learning action generation. Most VLA models focus on interpreting vision and…

Robotics · Computer Science 2026-03-20 Zhuoyang Liu , Jiaming Liu , Jiadong Xu , Nuowei Han , Chenyang Gu , Hao Chen , Kaichen Zhou , Renrui Zhang , Kai Chin Hsieh , Kun Wu , Zhengping Che , Jian Tang , Shanghang Zhang

Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models

Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for robotic manipulation, in which reliable action prediction critically depends on accurately interpreting and integrating visual observations conditioned on…

Computer Vision and Pattern Recognition · Computer Science 2026-03-18 Yulin Luo , Hao Chen , Zhuangzhe Wu , Bowen Sui , Jiaming Liu , Chenyang Gu , Zhuoyang Liu , Qiuxuan Feng , Jiale Yu , Shuo Gu , Peng Jia , Pheng-Ann Heng , Shanghang Zhang

Unlocking the Latent Canvas: Eliciting and Benchmarking Symbolic Visual Expression in LLMs

Current multimodal approaches predominantly treat visual generation as an external process, relying on pixel rendering or code execution, thereby overlooking the native visual representation capabilities latent within Large Language Models…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Yiren Zheng , Shibo Li , Jiaming Liu , Haofan Wang , Yiren Song

Learning Generalizable 3D Medical Image Representations from Mask-Guided Self-Supervision

Foundation models have transformed vision and language by learning general-purpose representations from large-scale unlabeled data, yet 3D medical imaging lacks analogous approaches. Existing self-supervised methods rely on low-level…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Yunhe Gao , Yabin Zhang , Chong Wang , Jiaming Liu , Maya Varma , Jean-Benoit Delbrouck , Akshay Chaudhari , Curtis Langlotz

EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering

Generating accurate multilingual text with diffusion models has long been desired but remains challenging. Recent methods have made progress in rendering text in a single language, but rendering arbitrary languages is still an unexplored…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Runnan Lu , Yuxuan Zhang , Jiaming Liu , Haofan Wang , Yiren Song

Model-Free DRL Control for Power Inverters: From Policy Learning to Real-Time Implementation via Knowledge Distillation

In response to the trade-off between control performance and computational burden hindering the deployment of Deep Reinforcement Learning (DRL) in power inverters, this paper presents a novel model-free control framework leveraging policy…

Systems and Control · Electrical Eng. & Systems 2026-03-10 Yang Yang , Chenggang Cui , Xitong Niu , Jiaming Liu , Chuanlin Zhang

Merlin: A Computed Tomography Vision-Language Foundation Model and Dataset

The large volume of abdominal computed tomography (CT) scans coupled with the shortage of radiologists have intensified the need for automated medical image analysis tools. Previous state-of-the-art approaches for automated analysis…

Computer Vision and Pattern Recognition · Computer Science 2026-03-05 Louis Blankemeier , Ashwin Kumar , Joseph Paul Cohen , Jiaming Liu , Longchao Liu , Dave Van Veen , Syed Jamal Safdar Gardezi , Hongkun Yu , Magdalini Paschali , Zhihong Chen , Jean-Benoit Delbrouck , Eduardo Reis , Robbie Holland , Cesar Truyts , Christian Bluethgen , Yufu Wu , Long Lian , Malte Engmann Kjeldskov Jensen , Sophie Ostmeier , Maya Varma , Jeya Maria Jose Valanarasu , Zhongnan Fang , Zepeng Huo , Zaid Nabulsi , Diego Ardila , Wei-Hung Weng , Edson Amaro Junior , Neera Ahuja , Jason Fries , Nigam H. Shah , Greg Zaharchuk , Marc Willis , Adam Yala , Andrew Johnston , Robert D. Boutin , Andrew Wentland , Curtis P. Langlotz , Jason Hom , Sergios Gatidis , Akshay S. Chaudhari