Jiaming Tang — Scifaro

${\pi}_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities

We present a new robotic foundation model, called ${\pi}_{0.7}$, that can enable strong out-of-the-box performance in a wide range of scenarios. ${\pi}_{0.7}$ can follow diverse language instructions in unseen environments, including…

Machine Learning · Computer Science 2026-04-28 Physical Intelligence , Bo Ai , Ali Amin , Raichelle Aniceto , Ashwin Balakrishna , Greg Balke , Kevin Black , George Bokinsky , Shihao Cao , Thomas Charbonnier , Vedant Choudhary , Foster Collins , Ken Conley , Grace Connors , James Darpinian , Karan Dhabalia , Maitrayee Dhaka , Jared DiCarlo , Danny Driess , Michael Equi , Adnan Esmail , Yunhao Fang , Chelsea Finn , Catherine Glossop , Thomas Godden , Ivan Goryachev , Lachlan Groom , Haroun Habeeb , Hunter Hancock , Karol Hausman , Gashon Hussein , Victor Hwang , Brian Ichter , Connor Jacobsen , Szymon Jakubczak , Rowan Jen , Tim Jones , Gregg Kammerer , Ben Katz , Liyiming Ke , Mairbek Khadikov , Chandra Kuchi , Marinda Lamb , Devin LeBlanc , Brendon LeCount , Sergey Levine , Xinyu Li , Adrian Li-Bell , Vladislav Lialin , Zhonglin Liang , Wallace Lim , Yao Lu , Enyu Luo , Vishnu Mano , Nandan Marwaha , Aikys Mongush , Liam Murphy , Suraj Nair , Tyler Patterson , Karl Pertsch , Allen Z. Ren , Gavin Schelske , Charvi Sharma , Baifeng Shi , Lucy Xiaoyang Shi , Laura Smith , Jost Tobias Springenberg , Kyle Stachowicz , Will Stoeckle , Jiaming Tang , Jimmy Tanner , Shalom Tekeste , Marcel Torne , Kyle Vedder , Quan Vuong , Anna Walling , Haohuan Wang , Jason Wang , XuDong Wang , Chris Whalen , Samuel Whitmore , Blake Williams , Charles Xu , Sukwon Yoo , Lili Yu , Wuming Zhang , Zhuoyang Zhang , Ury Zhilinsky

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Large language models (LLMs) have transformed numerous AI applications. On-device LLM is becoming increasingly important: running LLMs locally on edge devices can reduce the cloud computing cost and protect users' privacy. However, the…

Computation and Language · Computer Science 2026-04-28 Ji Lin , Jiaming Tang , Haotian Tang , Shang Yang , Wei-Ming Chen , Wei-Chen Wang , Guangxuan Xiao , Xingyu Dang , Chuang Gan , Song Han

MEM: Multi-Scale Embodied Memory for Vision Language Action Models

Conventionally, memory in end-to-end robotic learning involves inputting a sequence of past observations into the learned policy. However, in complex multi-stage real-world tasks, the robot's memory must represent past events at multiple…

Robotics · Computer Science 2026-03-10 Marcel Torne , Karl Pertsch , Homer Walke , Kyle Vedder , Suraj Nair , Brian Ichter , Allen Z. Ren , Haohuan Wang , Jiaming Tang , Kyle Stachowicz , Karan Dhabalia , Michael Equi , Quan Vuong , Jost Tobias Springenberg , Sergey Levine , Chelsea Finn , Danny Driess

AgentBay: A Hybrid Interaction Sandbox for Seamless Human-AI Intervention in Agentic Systems

The rapid advancement of Large Language Models (LLMs) is catalyzing a shift towards autonomous AI Agents capable of executing complex, multi-step tasks. However, these agents remain brittle when faced with real-world exceptions, making…

Artificial Intelligence · Computer Science 2025-12-05 Yun Piao , Hongbo Min , Hang Su , Leilei Zhang , Lei Wang , Yue Yin , Xiao Wu , Zhejing Xu , Liwei Qu , Hang Li , Xinxin Zeng , Wei Tian , Fei Yu , Xiaowei Li , Jiayi Jiang , Tongxu Liu , Hao Tian , Yufei Que , Xiaobing Tu , Bing Suo , Yuebing Li , Xiangting Chen , Zeen Zhao , Jiaming Tang , Wei Huang , Xuguang Li , Jing Zhao , Jin Li , Jie Shen , Jinkui Ren , Xiantao Zhang

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Reasoning language models have demonstrated remarkable capabilities on challenging tasks by generating elaborate chain-of-thought (CoT) solutions. However, such lengthy generation shifts the inference bottleneck from compute-bound to…

Machine Learning · Computer Science 2025-12-02 Yilong Zhao , Jiaming Tang , Kan Zhu , Zihao Ye , Chi-Chih Chang , Chaofan Lin , Jongseok Park , Guangxuan Xiao , Mohamed S. Abdelfattah , Mingyu Gao , Baris Kasikci , Song Han , Ion Stoica

VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference

Vision-Language-Action models (VLAs) are becoming increasingly capable across diverse robotic tasks. However, their real-world deployment remains slow and inefficient: demonstration videos are often sped up by 5-10x to appear smooth, with…

Robotics · Computer Science 2025-12-02 Jiaming Tang , Yufei Sun , Yilong Zhao , Shang Yang , Yujun Lin , Zhuoyang Zhang , James Hou , Yao Lu , Zhijian Liu , Song Han

Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning

Leveraging attention sparsity to accelerate long-context large language models (LLMs) has been a hot research topic. However, current algorithms such as sparse attention or key-value (KV) cache compression tend to use a fixed budget, which…

Machine Learning · Computer Science 2025-11-05 Chaofan Lin , Jiaming Tang , Shuo Yang , Hanshuo Wang , Tian Tang , Boyu Tian , Ion Stoica , Song Han , Mingyu Gao

SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference

Vision Language Models (VLMs) have rapidly advanced in integrating visual and textual reasoning, powering applications across high-resolution image understanding, long-video analysis, and multi-turn conversation. However, their scalability…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Samir Khaki , Junxian Guo , Jiaming Tang , Shang Yang , Yukang Chen , Konstantinos N. Plataniotis , Yao Lu , Song Han , Zhijian Liu

Transitive Array: An Efficient GEMM Accelerator with Result Reuse

Deep Neural Networks (DNNs) and Large Language Models (LLMs) have revolutionized artificial intelligence, yet their deployment faces significant memory and computational challenges, especially in resource-constrained environments.…

Hardware Architecture · Computer Science 2025-04-24 Cong Guo , Chiyue Wei , Jiaming Tang , Bowen Duan , Song Han , Hai Li , Yiran Chen

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

Large language models (LLMs) have shown remarkable potential in processing long sequences and complex reasoning tasks, yet efficiently serving these models remains challenging due to the quadratic computational complexity of attention in…

Computation and Language · Computer Science 2025-04-22 Shang Yang , Junxian Guo , Haotian Tang , Qinghao Hu , Guangxuan Xiao , Jiaming Tang , Yujun Lin , Zhijian Liu , Yao Lu , Song Han

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Deploying long-context large language models (LLMs) is essential but poses significant computational and memory challenges. Caching all Key and Value (KV) states across all attention heads consumes substantial memory. Existing KV cache…

Computation and Language · Computer Science 2024-10-15 Guangxuan Xiao , Jiaming Tang , Jingwei Zuo , Junxian Guo , Shang Yang , Haotian Tang , Yao Fu , Song Han

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

As the demand for long-context large language models (LLMs) increases, models with context windows of up to 128K or 1M tokens are becoming increasingly prevalent. However, long-context LLM inference is challenging since the inference speed…

Computation and Language · Computer Science 2024-08-28 Jiaming Tang , Yilong Zhao , Kan Zhu , Guangxuan Xiao , Baris Kasikci , Song Han

DCRMTA: Unbiased Causal Representation for Multi-touch Attribution

Multi-touch attribution (MTA) currently plays a pivotal role in achieving a fair estimation of the contributions of each advertising touchpoint to-wards conversion behavior, deeply influencing budget allocation and advertising…

Machine Learning · Computer Science 2024-02-06 Jiaming Tang

OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization

Transformer-based large language models (LLMs) have achieved great success with the growing model size. LLMs' size grows by $240\times$ every two years, which outpaces the hardware progress and makes model inference increasingly costly.…

Hardware Architecture · Computer Science 2023-04-18 Cong Guo , Jiaming Tang , Weiming Hu , Jingwen Leng , Chen Zhang , Fan Yang , Yunxin Liu , Minyi Guo , Yuhao Zhu