English
Related papers

Related papers: LMStream: When Distributed Micro-Batch Stream Proc…

200 papers

The increasing adoption of large language models (LLMs) necessitates inference serving systems that can deliver both high throughput and low latency. Deploying LLMs with hundreds of billions of parameters on memory-constrained GPUs exposes…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-10 Bowen Pang , Kai Li , Feifan Wang

The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to…

Databases · Computer Science 2019-06-27 Jeyhun Karimov , Tilmann Rabl , Asterios Katsifodimos , Roman Samarev , Henri Heiskanen , Volker Markl

While ML model training and inference are both GPU-intensive, CPU-based data processing is often the bottleneck. Distributed data processing systems based on the batch or stream processing models assume homogeneous resource requirements.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-23 Frank Sifei Luan , Ron Yifeng Wang , Yile Gu , Ziming Mao , Charlotte Lin , Amog Kamsetty , Hao Chen , Cheng Su , Balaji Veeramani , Scott Lee , SangBin Cho , Clark Zinzow , Eric Liang , Ion Stoica , Stephanie Wang

Efficient LLM serving must balance throughput and latency across diverse, bursty workloads. We introduce StreamServe, a disaggregated prefill decode serving architecture that combines metric aware routing across compute lanes with adaptive…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-14 Satyam Kumar , Arpit Singh Gautam , Kailash Talreja , Saurabh Jha

Many well-known, real-world problems involve dynamic data which describe the relationship among the entities. Hypergraphs are powerful combinatorial structures that are frequently used to model such data. For many of today's data-centric…

Data Structures and Algorithms · Computer Science 2021-03-10 Fatih Taşyaran , Berkay Demireller , Kamer Kaya , Bora Uçar

Real-time LLM interactions demand streamed token generations, where text tokens are progressively generated and delivered to users while balancing two objectives: responsiveness (i.e., low time-to-first-token) and steady generation…

Machine Learning · Computer Science 2025-10-06 Junyi Chen , Chuheng Du , Renyuan Liu , Shuochao Yao , Dingtian Yan , Jiang Liao , Shengzhong Liu , Fan Wu , Guihai Chen

In this paper, we present a vision for a new generation of multimodal streaming systems that embed MLLMs as first-class operators, enabling real-time query processing across multiple modalities. Achieving this is non-trivial: while recent…

This paper presents StreamChat, a novel approach that enhances the interaction capabilities of Large Multimodal Models (LMMs) with streaming video content. In streaming interaction scenarios, existing methods rely solely on visual…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Jihao Liu , Zhiding Yu , Shiyi Lan , Shihao Wang , Rongyao Fang , Jan Kautz , Hongsheng Li , Jose M. Alvare

Stream processing is usually done either on a tuple-by-tuple basis or in micro-batches. There are many applications where tuples over a predefined duration/window must be processed within certain deadlines. Processing such queries using…

Databases · Computer Science 2024-09-23 Saranya Chandrasekaran , S. Sudarshan

Recent work has initiated the study of dense graph processing using graph sketching methods, which drastically reduce space costs by lossily compressing information about the input graph. In this paper, we explore the strange and surprising…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-18 David Tench , Evan T. West , Kenny Zhang , Michael Bender , Daniel DeLayo , Martin Farach-Colton , Gilvir Gill , Tyler Seip , Victor Zhang

Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-12 Do Le Quoc , Ruichuan Chen , Pramod Bhatotia , Christof Fetze , Volker Hilt , Thorsten Strufe

Many modern applications require real-time processing of large volumes of high-speed data. Such data processing needs can be modeled as a streaming computation. A streaming computation is specified as a dataflow graph that exposes multiple…

Databases · Computer Science 2018-04-02 Guna Prasaad , G. Ramalingam , Kaushik Rajan

In up-to-date machine learning (ML) applications on cloud or edge computing platforms, batching is an important technique for providing efficient and economical services at scale. In particular, parallel computing resources on the…

Machine Learning · Computer Science 2023-09-04 Yaodan Xu , Jingzhou Sun , Sheng Zhou , Zhisheng Niu

Recent data stream processing systems (DSPSs) can achieve excellent performance when processing large volumes of data under tight latency constraints. However, they sacrifice support for concurrent state access that eases the burden of…

Databases · Computer Science 2023-06-21 Shuhao Zhang , Yingjun Wu , Feng Zhang , Bingsheng He

We initiate the study of graph algorithms in the streaming setting on massive distributed and parallel systems inspired by practical data processing systems. The objective is to design algorithms that can efficiently process evolving graphs…

Data Structures and Algorithms · Computer Science 2025-01-20 Artur Czumaj , Gopinath Mishra , Anish Mukherjee

Efficient execution of deep learning workloads on dataflow architectures is crucial for overcoming memory bottlenecks and maximizing performance. While streaming intermediate results between computation kernels can significantly improve…

Hardware Architecture · Computer Science 2025-09-24 Hanchen Ye , Deming Chen

In the burgeoning realm of Internet of Things (IoT) applications on edge devices, data stream compression has become increasingly pertinent. The integration of added compression overhead and limited hardware resources on these devices calls…

Databases · Computer Science 2024-06-18 Xianzhi Zeng , Shuhao Zhang

Distributed stream processing frameworks help building scalable and reliable applications that perform transformations and aggregations on continuous data streams. This paper introduces ShuffleBench, a novel benchmark to evaluate the…

Software Engineering · Computer Science 2024-03-08 Sören Henning , Adriano Vogel , Michael Leichtfried , Otmar Ertl , Rick Rabiser

Whilst computational resources at the cloud edge can be leveraged to improve latency and reduce the costs of cloud services for a wide variety mobile, web, and IoT applications; such resources are naturally constrained. For distributed…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-20 Ben Blamey , Ida-Maria Sintorn , Andreas Hellander , Salman Toor

We introduce StreamDiffusion, a real-time diffusion pipeline designed for interactive image generation. Existing diffusion models are adept at creating images from text or image prompts, yet they often fall short in real-time interaction.…

Computer Vision and Pattern Recognition · Computer Science 2025-07-09 Akio Kodaira , Chenfeng Xu , Toshiki Hazama , Takanori Yoshimoto , Kohei Ohno , Shogo Mitsuhori , Soichi Sugano , Hanying Cho , Zhijian Liu , Masayoshi Tomizuka , Kurt Keutzer
‹ Prev 1 2 3 10 Next ›