操作系统 — Scifaro

Sharpen the Spec, Cut the Code: A Case for Generative File System with SYSSPEC

File systems are critical OS components that require constant evolution to support new hardware and emerging application needs. However, the traditional paradigm of developing features, fixing bugs, and maintaining the system incurs…

操作系统 · 计算机科学 2026-02-11 Qingyuan Liu , Mo Zou , Hengbin Zhang , Dong Du , Yubin Xia , Haibo Chen

HALO: A Fine-Grained Resource Sharing Quantum Operating System

As quantum computing enters the cloud era, thousands of users must share access to a small number of quantum processors. Users need to wait minutes to days to start their jobs, which only takes a few seconds for execution. Current quantum…

操作系统 · 计算机科学 2026-02-10 John Zhuoyang Ye , Jiyuan Wang , Yifan Qiao , Jens Palsberg

Towards High-Goodput LLM Serving with Prefill-decode Multiplexing

Large Language Model (LLM) serving must meet stringent Service Level Objectives (SLOs) for both the prefill and decode phases. Some existing solutions disaggregate the two phases, causing potential resource idleness or compute redundancy.…

操作系统 · 计算机科学 2026-02-10 Yukang Chen , Weihao Cui , Han Zhao , Ziyi Xu , Xiaoze Fan , Xusheng Chen , Yangjie Zhou , Shixuan Sun , Bingsheng He , Quan Chen

Flare: Anomaly Diagnostics for Divergent LLM Training in GPU Clusters of Thousand-Plus Scale

The rapid proliferation of large language models has driven the need for efficient GPU training clusters. However, it is challenging due to the frequent occurrence of training anomalies. Since existing diagnostic tools are narrowly tailored…

操作系统 · 计算机科学 2026-02-10 Weihao Cui , Ji Zhang , Han Zhao , Chao Liu , Jian Sha , Bingsheng He , Minyi Guo , Quan Chen

ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented Generation

The prefill stage of long-context Retrieval-Augmented Generation (RAG) is severely bottlenecked by computational overhead. To mitigate this, recent methods assemble pre-calculated KV caches of retrieved RAG documents (by a user query) and…

操作系统 · 计算机科学 2026-02-06 Shihao Wang , Jiahao Chen , Yanqi Pan , Hao Huang , Yichen Hao , Xiangyu Zou , Wen Xia , Wentao Zhang , Chongyang Qiu , Pengfei Wang

Peformance Isolation for Inference Processes in Edge GPU Systems

This work analyzes the main isolation mechanisms available in modern NVIDIA GPUs: MPS, MIG, and the recent Green Contexts, to ensure predictable inference time in safety-critical applications using deep learning models. The experimental…

操作系统 · 计算机科学 2026-01-28 Juan José Martín , José Flich , Carles Hernández

DAVOS: An Autonomous Vehicle Operating System in the Vehicle Computing Era

Vehicle computing represents a fundamental shift in how autonomous vehicles are designed and deployed, transforming them from isolated transportation systems into mobile computing platforms that support both safety-critical, real-time…

操作系统 · 计算机科学 2026-01-26 Yuxin Wang , Yuankai He , Boyang Tian , Lichen Xian , Weisong Shi

"Range as a Key" is the Key! Fast and Compact Cloud Block Store Index with RASK

In cloud block store, indexing is on the critical path of I/O operations and typically resides in memory. With the scaling of users and the emergence of denser storage media, the index has become a primary memory consumer, causing memory…

操作系统 · 计算机科学 2026-01-21 Haoru Zhao , Mingkai Dong , Erci Xu , Zhongyu Wang , Haibo Chen

ContiguousKV: Accelerating LLM Prefill with Granularity-Aligned KV Cache Management

Efficiently serving Large Language Models (LLMs) with persistent Prefix Key-Value (KV) Cache is critical for applications like conversational search and multi-turn dialogue. Serving a request requires loading the pre-computed prefix KV…

操作系统 · 计算机科学 2026-01-21 Jing Zou , Shangyu Wu , Hancong Duan , Qiao Li , Chun Jason Xue

Nixie: Efficient, Transparent Temporal Multiplexing for Consumer GPUs

Consumer machines are increasingly running large ML workloads such as large language models (LLMs), text-to-image generation, and interactive image editing. Unlike datacenter GPUs, consumer GPUs serve single-user, rapidly changing…

操作系统 · 计算机科学 2026-01-21 Yechen Xu , Yifei Wang , Nathanael Ren , Yiran Chen , Danyang Zhuo

A Survey of Fuzzing Open-Source Operating Systems

Vulnerabilities in open-source operating systems (OSs) pose substantial security risks to software systems, making their detection crucial. While fuzzing has been an effective vulnerability detection technique in various domains, OS fuzzing…

操作系统 · 计算机科学 2026-01-21 Kun Hu , Qicai Chen , Wenzhuo Zhang , Zilong Lu , Bihuan Chen , You Lu , Haowen Jiang , Bingkun Sun , Xin Peng , Wenyun Zhao

AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model Serving

Large language model (LLM) applications often reuse previously processed context, such as chat history and documents, which introduces significant redundant computation. Existing LLM serving systems address such redundant computation by…

操作系统 · 计算机科学 2026-01-19 Shaoting Feng , Hanchen Li , Kuntai Du , Zhuohan Gu , Yuhan Liu , Jiayi Yao , Siddhant Ray , Samuel Shen , Yihua Cheng , Ganesh Ananthanarayanan , Junchen Jiang

Rethinking Inter-Process Communication with Memory Operation Offloading

As multimodal and AI-driven services exchange hundreds of megabytes per request, existing IPC runtimes spend a growing share of CPU cycles on memory copies. Although both hardware and software mechanisms are exploring memory offloading,…

操作系统 · 计算机科学 2026-01-13 Misun Park , Richi Dubey , Yifan Yuan , Nam Sung Kim , Ada Gavrilovska

Towards Fully-fledged GPU Multitasking via Proactive Memory Scheduling

The limited HBM capacity has become the primary bottleneck for hosting an increasing number of larger-scale GPU tasks. While demand paging extends capacity via host DRAM, it incurs up to 78x slowdown due to the massive working sets and poor…

操作系统 · 计算机科学 2026-01-05 Weihang Shen , Yinqiu Chen , Rong Chen , Haibo Chen

Vulcan: Instance-Optimal Systems Heuristics Through LLM-Driven Search

Resource-management tasks in modern operating and distributed systems continue to rely primarily on hand-designed heuristics for tasks such as scheduling, caching, or active queue management. Designing performant heuristics is an expensive,…

操作系统 · 计算机科学 2026-01-01 Rohit Dwivedula , Divyanshu Saxena , Sujay Yadalam , Daehyeok Kim , Aditya Akella

LEFT-RS: A Lock-Free Fault-Tolerant Resource Sharing Protocol for Multicore Real-Time Systems

Emerging real-time applications have driven the transition to multicore embedded systems, where tasks must share resources due to functional demands and limited availability. These resources, whether local or global, are protected within…

操作系统 · 计算机科学 2025-12-29 Nan Chen , Xiaotian Dai , Tong Cheng , Alan Burns , Iain Bate , Shuai Zhao

gpu_ext: Extensible OS Policies for GPUs via eBPF

Performance in modern GPU-centric systems increasingly depends on resource management policies, including memory placement, scheduling, and observability. However, uniform policies typically yield suboptimal performance across diverse…

操作系统 · 计算机科学 2025-12-23 Yusheng Zheng , Tong Yu , Yiwei Yang , Minghui Jiang , Xiangyu Gao , Jianchang Su , Yanpeng Hu , Wenan Mao , Wei Zhang , Dan Williams , Andi Quinn

Trustworthy and Controllable Professional Knowledge Utilization in Large Language Models with TEE-GPU Execution

Future improvements in large language model (LLM) services increasingly hinge on access to high-value professional knowledge rather than more generic web data. However, the data providers of this knowledge face a skewed tradeoff between…

操作系统 · 计算机科学 2025-12-22 Yifeng Cai , Zhida An , Yuhan Meng , Houqian Liu , Pengli Wang , Hanwen Lei , Yao Guo , Ding Li

EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving

Reusing KV cache is essential for high efficiency of Large Language Model (LLM) inference systems. With more LLM users, the KV cache footprint can easily exceed GPU memory capacity, so prior work has proposed to either evict KV cache to…

操作系统 · 计算机科学 2025-12-18 Shaoting Feng , Yuhan Liu , Hanchen Li , Xiaokun Chen , Samuel Shen , Kuntai Du , Zhuohan Gu , Rui Zhang , Yuyang Huang , Yihua Cheng , Jiayi Yao , Qizheng Zhang , Ganesh Ananthanarayanan , Junchen Jiang

Principled Performance Tunability in Operating System Kernels

The Linux kernel source code contains numerous constant values that critically influence system performance. Many of these constants, which we term perf-consts, are magic numbers that encode brittle assumptions about hardware and workloads.…

操作系统 · 计算机科学 2025-12-16 Zhongjie Chen , Wentao Zhang , Yulong Tang , Ran Shu , Fengyuan Ren , Tianyin Xu , Jing Liu