操作系统 — Scifaro

Tutti: Making SSD-Backed KV Cache Practical for Long-Context LLM Serving

LLM serving relies on prefix caching to improve inference performance. As growing contexts push key-value (KV) cache footprint far beyond GPU HBM and CPU DRAM capacity, KV cache is increasingly offloaded to NVMe SSDs. Unfortunately,…

操作系统 · 计算机科学 2026-05-06 Shi Qiu , Yifan Hu , Xintao Wang , Wenhao Zhu , Jianqin Yan , Hao Chen , Kaiqiang Xu , Kai Chen , Yiming Zhang

CityOS: Privacy Architecture for Urban Sensing

Cities are rapidly deploying sensing infrastructure -- cameras, environmental sensors, and connected kiosks -- that continuously observe public spaces, yet they lack a system architecture governing how applications access, aggregate, and…

操作系统 · 计算机科学 2026-05-05 Giorgio Cavicchioli , Mark Chen , Navid Salami Pargoo , Shuren Xia , Xiaotian Zhou , Roxana Geambasu , Jason Nieh , Jorge Ortiz

VUDA: Breaking CUDA-Vulkan Isolation for Spatial Sharing of Compute and Graphics on the Same GPU

GPU-based simulation environments for embodied AI interleave physics simulation (CUDA) and photorealistic rendering (Vulkan) on a single device. We observe that two foundational scenarios -- simulation data generation and RL training -- can…

操作系统 · 计算机科学 2026-05-05 Bin Xu , Pengfei Hu , Wenxin Zheng , Jinyu Gu , Haibo Chen

Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes

Autonomous agents act through sandboxed containers and microVMs whose state spans filesystems, processes, and runtime artifacts. Checkpoint and restore (C/R) of this state is needed for fault tolerance, spot execution, RL rollout branching,…

操作系统 · 计算机科学 2026-05-01 Tianyuan Wu , Chaokun Chang , Lunxi Cao , Wei Gao , Wei Wang

Affinity Tailor: Dynamic Locality-Aware Scheduling at Scale

Modern large multicore systems often run multiple workloads that share CPUs under schedulers such as Linux CFS. To keep CPUs busy, these schedulers load-balance runnable work, causing each workload to execute on many cores. This weakens…

操作系统 · 计算机科学 2026-05-01 Jin Xin Ng , Ori Livneh , Richard O'Grady , Josh Don , Peng Ding , Samuel Grossman , Luis Otero , Chris Kennelly , David Lo , Carlos Villavieja

treVM: Tiny Rust Embedded Virtual Machines with WASM on Variable Resource-Constrained Hardware

Software stacks embedded on microcontroller-based hardware typically provide rudimentary APIs programmed in C/C++, basic connectivity and, sometimes, a firmware update mechanism. Such coarse mechanisms contrast with widely used APIs and…

操作系统 · 计算机科学 2026-05-01 Antoine Lavandier , Bastien Buil , Chrystel Gaber , Emmanuel Baccelli

MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems

Large language models (LLMs) are increasingly deployed as the execution core of autonomous agents rather than as standalone text generators. Agentic workloads induce a temporal shift from single-turn inference to multi-turn LLM-tool loops,…

操作系统 · 计算机科学 2026-05-01 Yifei Wang , Hancheng Ye , Yechen Xu , Cong Guo , Chiyue Wei , Qinsi Wang , Dongting Li , Tingjun Chen , Hai "Helen" Li , Danyang Zhuo , Yiran Chen

Proxics: an efficient programming model for far memory accelerators

The use of disaggregated or far memory systems such as CXL memory pools has renewed interest in Near-Data Processing (NDP): situating cores close to memory to reduce bandwidth requirements to and from the CPU. Hardware designs for such…

操作系统 · 计算机科学 2026-04-21 Zikai Liu , Niels Pressel , Jasmin Schult , Roman Meier , Pengcheng Xu , Timothy Roscoe

ProbeLogits: Kernel-Level LLM Inference Primitives for AI-Native Operating Systems

An OS kernel that runs LLM inference internally can read logit distributions before any text is generated and act on them as a governance primitive. This paper presents ProbeLogits, a kernel-level operation that performs a single forward…

操作系统 · 计算机科学 2026-04-21 Daeyeon Son

Equilibria: Fair Multi-Tenant CXL Memory Tiering At Scale

Memory dominates datacenter system cost and power. Memory expansion via Compute Express Link (CXL) is an effective way to provide additional memory at lower cost and power, but its effective use requires software-level tiering for…

操作系统 · 计算机科学 2026-04-21 Kaiyang Zhao , Neha Gholkar , Hasan Maruf , Abhishek Dhanotia , Johannes Weiner , Gregory Price , Ning Sun , Bhavya Dwivedi , Stuart Clark , Dimitrios Skarlatos

NetCAS: Dynamic Cache and Backend Device Management in Networked Environments

Modern storage systems often combine fast cache with slower backend devices to accelerate I/O. As performance gaps narrow, concurrently accessing both devices, rather than relying solely on cache hits, can improve throughput. However, in…

操作系统 · 计算机科学 2026-04-21 Joon Yong Hwang , Chanseo Park , Younghoon Kim

Don't Let AI Agents YOLO Your Files: Shifting Information and Control to Filesystems for Agent Safety and Autonomy

AI coding agents operate directly on users' filesystems, where they regularly corrupt data, delete files, and leak secrets. Current approaches force a tradeoff between safety and autonomy: unrestricted access risks harm, while frequent…

操作系统 · 计算机科学 2026-04-17 Shawn Wanxiang Zhong , Junxuan Liao , Jing Liu , Mai Zheng , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau

VeruSAGE: A Study of Agent-Based Verification for Rust Systems

Large language models (LLMs) have shown impressive capability to understand and develop code. However, their capability to rigorously reason about and prove code correctness remains in question. This paper offers a comprehensive study of…

操作系统 · 计算机科学 2026-04-16 Chenyuan Yang , Natalie Neamtu , Chris Hawblitzel , Jacob R. Lorch , Shan Lu

TierBPF: Page Migration Admission Control for Tiered Memory via eBPF

Existing software-based memory tiering systems decide which pages to place on the slower or faster tier. However, they do not take into account two important factors that greatly influence application performance: the size of the migrated…

操作系统 · 计算机科学 2026-04-15 Xi Wang , Tal Zussman , Yuang Xu , Bin Ma , Asaf Cidon , Dong Li

Hybrid Adaptive Tuning for Tiered Memory Systems

Memory tiering provides a cost-effective solution to increase memory capacity, utilization, and even bandwidth. Memory tiering relies on system software for memory profiling, detection of frequently accessed pages, and page migration. Such…

操作系统 · 计算机科学 2026-04-15 Xi Wang , Jie Liu , Shuangyan Yang , Jongryool Kim , Pengfei Su , Dong Li

Nanvix: A Multikernel OS Design for High-Density Serverless Deployments

Serverless providers strive for high resource utilization by optimizing deployment density: how many applications can be deployed per host server. However, achieving high deployment density without compromising application performance or…

操作系统 · 计算机科学 2026-04-14 Carlos Segarra , Pedro Henrique Penna , Enrique Saurez , Íñigo Goiri , Peter Pietzuch , Shan Lu , Rodrigo Fonseca

EdgeFlow: Fast Cold Starts for LLMs on Mobile Devices

Deploying large language models (LLMs) on mobile devices is an emerging trend to enable data privacy and offline accessibility of LLM applications. Modern mobile neural processing units (NPUs) make such deployment increasingly feasible.…

操作系统 · 计算机科学 2026-04-13 Yongsheng Yan , Jiacheng Shen , Xuchuan Luo , Yangfan Zhou

Valve: Production Online-Offline Inference Colocation with Jointly-Bounded Preemption Latency and Rate

LLM inference powers latency-critical production services nowadays. The bursty nature of inference traffic results in over-provisioning, which in turn leads to resource underutilization. While online-offline colocation promises to utilize…

操作系统 · 计算机科学 2026-04-10 Fangyue Liu , Hua Liu , Xinyuan Lyu , Shuo Ai , Hao Liang , Lingpeng Chen , Ziqian Hu , Chong Zha , Xin Jin , Hanmei Luo , Peng Chen

Quine: Realizing LLM Agents as Native POSIX Processes

Current LLM agent frameworks often implement isolation, scheduling, and communication at the application layer, even though these mechanisms are already provided by mature operating systems. Instead of introducing another application-layer…

操作系统 · 计算机科学 2026-04-10 Hao Ke

Horizon-LM: A RAM-Centric Architecture for LLM Training

The rapid growth of large language models (LLMs) has outpaced the evolution of single-GPU hardware, making model scale increasingly constrained by memory capacity rather than computation. While modern training systems extend GPU memory…

操作系统 · 计算机科学 2026-04-08 Zhengqing Yuan , Lichao Sun , Yanfang Ye