English
Related papers

Related papers: StreamingClaw Technical Report

200 papers

Real-time, continuous understanding of visual signals is essential for real-world interactive AI applications, and poses a fundamental system-level challenge. Existing research on streaming video understanding, however, typically focuses on…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Guowei Tang , Tianwen Qian , Huanran Zheng , Yifei Wang , Xiaoling Wang

Recent advances in Large Language Models (LLMs) have enabled the development of Video-LLMs, advancing multimodal learning by bridging video data with language tasks. However, current video understanding models struggle with processing long…

Computer Vision and Pattern Recognition · Computer Science 2025-01-24 Haomiao Xiong , Zongxin Yang , Jiazuo Yu , Yunzhi Zhuge , Lu Zhang , Jiawen Zhu , Huchuan Lu

As embodied intelligence advances toward real-world deployment, the ability to continuously perceive and reason over streaming visual inputs becomes essential. In such settings, an agent must maintain situational awareness of its…

Computer Vision and Pattern Recognition · Computer Science 2025-12-05 Yifei Wang , Zhenkai Li , Tianwen Qian , Huanran Zheng , Zheng Wang , Yuqian Fu , Xiaoling Wang

Understanding continuous video streams plays a fundamental role in real-time applications including embodied AI and autonomous driving. Unlike offline video understanding, streaming video understanding requires the ability to process video…

Computer Vision and Pattern Recognition · Computer Science 2025-07-23 Yibin Yan , Jilan Xu , Shangzhe Di , Yikun Liu , Yudi Shi , Qirui Chen , Zeqian Li , Yifei Huang , Weidi Xie

While streaming omni-video understanding demands continuous perception and proactive, real-time interaction, this crucial area remains largely under-explored. Current omni-modal methods are inherently designed for offline settings, limiting…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Ming Xie , Zizheng Huang , Xudong Tan , Chao Wang , Xiangyu Zeng , Wenxiao Wu , Tao Chen , Limin Wang , Yanwei Fu

Current embodied intelligent systems still face a substantial gap between high-level reasoning and low-level physical execution in open-world environments. Although Vision-Language-Action (VLA) models provide strong perception and intuitive…

Computer Vision and Pattern Recognition · Computer Science 2026-04-20 Dongjie Huo , Haoyun Liu , Guoqing Liu , Dekang Qi , Zhiming Sun , Maoguo Gao , Jianxin He , Yandan Yang , Xinyuan Chang , Feng Xiong , Xing Wei , Zhiheng Ma , Mu Xu

Extracting real-time insights from multi-modal data streams from various domains such as healthcare, intelligent transportation, and satellite remote sensing remains a challenge. High computational demands and limited knowledge scope…

Computer Vision and Pattern Recognition · Computer Science 2025-01-27 Murugan Sankaradas , Ravi K. Rajendran , Srimat T. Chakradhar

This paper presents StreamChat, a novel approach that enhances the interaction capabilities of Large Multimodal Models (LMMs) with streaming video content. In streaming interaction scenarios, existing methods rely solely on visual…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Jihao Liu , Zhiding Yu , Shiyi Lan , Shihao Wang , Rongyao Fang , Jan Kautz , Hongsheng Li , Jose M. Alvare

Real-time streaming video understanding in domains such as autonomous driving and intelligent surveillance poses challenges beyond conventional offline video processing, requiring continuous perception, proactive decision making, and…

Computer Vision and Pattern Recognition · Computer Science 2026-04-30 Haolin Yang , Feilong Tang , Lingxiao Zhao , Xinlin Zhuang , Yifan Lu , Xiang An , Ming Hu , Xiaofeng Zhang , Abdalla Swikir , Junjun He , Zongyuan Ge , Muhammad Haris Khan , Imran Razzak

Vision-language-action (VLA) models have demonstrated exceptional performance in natural language-driven perception and control. However, the high computational cost of VLA models poses significant efficiency challenges, particularly for…

MediaClaw is a multimodal agent platform built on the OpenClaw ecosystem. Its core design follows a three-layer architecture of unified abstraction, pluginized extension, and workflow orchestration. The system is intended to address…

Artificial Intelligence · Computer Science 2026-05-15 Shaoan Zhao , Huanlin Gao , Qiang Hui , Ting Lu , Xueqiang Guo , Yantao Li , Xinpei Su , Fuyuan Shi , Chao Tan , Fang Zhao , Kai Wang , Shiguo Lian

Predicting the future occupancy states of the surrounding environment is a vital task for autonomous driving. However, current best-performing single-modality methods or multi-modality fusion perception methods are only able to predict…

Computer Vision and Pattern Recognition · Computer Science 2024-06-12 Yining Shi , Kun Jiang , Ke Wang , Jiusi Li , Yunlong Wang , Mengmeng Yang , Diange Yang

Vision-language models (VLMs) could power real-time assistants and autonomous agents, but they face a critical challenge: understanding near-infinite video streams without escalating latency and memory usage. Processing entire videos with…

Computer Vision and Pattern Recognition · Computer Science 2025-10-13 Ruyi Xu , Guangxuan Xiao , Yukang Chen , Liuning He , Kelly Peng , Yao Lu , Song Han

The rapid growth of streaming video applications demands multimodal models with enhanced capabilities for temporal dynamics understanding and complex reasoning. However, current Video Question Answering (VideoQA) datasets suffer from two…

Computer Vision and Pattern Recognition · Computer Science 2025-10-30 Yuhang Hu , Zhenyu Yang , Shihan Wang , Shengsheng Qian , Bin Wen , Fan Yang , Tingting Gao , Changsheng Xu

Embodied perception refers to the ability of an autonomous agent to perceive its environment so that it can (re)act. The responsiveness of the agent is largely governed by latency of its processing pipeline. While past work has studied the…

Computer Vision and Pattern Recognition · Computer Science 2020-08-26 Mengtian Li , Yu-Xiong Wang , Deva Ramanan

The rapid growth of data in velocity, volume, value, variety, and veracity has enabled exciting new opportunities and presented big challenges for businesses of all types. Recently, there has been considerable interest in developing systems…

Systems and Control · Electrical Eng. & Systems 2019-07-23 Shihao Ge , Haruna Isah , Farhana Zulkernine , Shahzad Khan

Inspired by the development of OpenClaw, there is a growing demand for mobile-based personal agents capable of handling complex and intuitive interactions. In this technical report, we introduce X-OmniClaw, a unified mobile agent designed…

Computer Vision and Pattern Recognition · Computer Science 2026-05-22 Xiaoming Ren , Ru Zhen , Chao Li , Yang Song , Qiuxia Hou , Yanhao Zhang , Peng Liu , Qi Qi , Quanlong Zheng , Qi Wu , Zhenyi Liao , Binqiang Pan , Haobo Ji , Haonan Lu

We present StreamBridge, a simple yet effective framework that seamlessly transforms offline Video-LLMs into streaming-capable models. It addresses two fundamental challenges in adapting existing models into online scenarios: (1) limited…

Computer Vision and Pattern Recognition · Computer Science 2025-09-22 Haibo Wang , Bo Feng , Zhengfeng Lai , Mingze Xu , Shiyu Li , Weifeng Ge , Afshin Dehghan , Meng Cao , Ping Huang

Real-time understanding of continuous video streams is essential for interactive assistants and multimodal agents operating in dynamic environments. However, most existing video reasoning approaches follow a batch paradigm that defers…

Computer Vision and Pattern Recognition · Computer Science 2026-03-16 Zikang Liu , Longteng Guo , Handong Li , Ru Zhen , Xingjian He , Ruyi Ji , Xiaoming Ren , Yanhao Zhang , Haonan Lu , Jing Liu

End-to-end spoken language understanding (SLU) has recently attracted increasing interest. Compared to the conventional tandem-based approach that combines speech recognition and language understanding as separate modules, the new approach…

Computation and Language · Computer Science 2021-07-20 Nihal Potdar , Anderson R. Avila , Chao Xing , Dong Wang , Yiran Cao , Xiao Chen
‹ Prev 1 2 3 10 Next ›