操作系统
Traditional memory management suffers from metadata overhead, architectural complexity, and stability degradation, problems intensified in cloud environments. Existing software/hardware optimizations are insufficient for cloud computing's…
The Directed Acyclic Graph (DAG) task model for real-time scheduling finds its primary practical target in Robot Operating System 2 (ROS 2). However, ROS 2's publish/subscribe API leaves DAG precedence constraints unenforced: a callback may…
Heterogeneous hardware and dynamic workloads worsen long-standing OS bottlenecks in scalability, adaptability, and manageability. At the same time, advances in machine learning (ML), large language models (LLMs), and agent-based methods…
The accuracy of large language models (LLMs) improves with increasing model size, but increasing model complexity also poses significant challenges to training stability. Periodic checkpointing is a key mechanism for fault recovery and is…
The \emph{Partial Cache-Coherence (PCC)} model maintains hardware cache coherence only within subsets of cores, enabling large-scale memory sharing with emerging memory interconnect technologies like Compute Express Link (CXL). However,…
We describe a system for serverless computing where users, programs, and the underlying platform share a common representation of a computation: a deterministic procedure, run in an environment of well-specified data or the outputs of other…
KV cache accelerates LLM inference by avoiding redundant computation, at the expense of memory. To support larger KV caches, prior work extends GPU memory with CPU memory via CPU-offloading. This involves swapping KV cache between GPU and…
Memory tiering in datacenters does not achieve its full potential due to hotness fragmentation -- the intermingling of hot and cold objects within memory pages. This fragmentation prevents page-based reclamation systems from distinguishing…
Cloud platforms host thousands of tenants that demand POSIX semantics, high throughput, and rapid evolution from their storage layer. Kernel-native distributed file systems supply raw speed, but their privileged code base couples every…
Modern software infrastructure increasingly relies on LLM agents for development and maintenance, such as Claude Code and Gemini-cli. However, these AI agents differ fundamentally from traditional deterministic software, posing a…
Proto is a new instructional OS that runs on commodity, portable hardware. It showcases modern features, including per-app address spaces, threading, commodity filesystems, USB, DMA, multicore support, self-hosted debugging, and a window…
Maratona Linux is the development environment used since 2016 on the ``Maratona de Programa\c{c}\~ao'', ICPC's South American regional contest. It consists of Debian packages that modify a standard Ubuntu installation in order to make it…
Policy design for various systems controllers has conventionally been a manual process, with domain experts carefully tailoring heuristics for the specific instance in which the policy will be deployed. In this paper, we re-imagine policy…
Real-time responsiveness in Linux is often constrained by interrupt contention and timer handling overhead, making it challenging to achieve sub-microsecond latency. This work introduces an interrupt isolation approach that centralizes and…
Far-memory systems, where applications store less-active data in more energy-efficient memory media, are increasingly adopted by data centers. However, applications are bottlenecked by on-demand data fetching from far- to local-memory. We…
Large swaths of low-level system software building blocks originally implemented in C/C++ are currently being swapped for equivalent rewrites in Rust, a relatively more secure and dependable programming language. So far, however, no…
Contemporary distributed computing workloads, including scientific computation, data mining, and machine learning, increasingly demand OS networking with minimal latency as well as high throughput, security, and reliability. However,…
This paper presents Nova, a real-time scheduling framework for serving agentic vision-language models (VLMs) on a single GPU with balanced per-request latency and overall request process throughput. Our design begins by enabling effective…
We present a novel mechanism to construct a covert channel based on page faults. A page fault is an event that occurs when a process or a thread tries to access a page of memory that is not currently mapped to its address space. The kernel…
The rise of AI agents powered by Large Language Models (LLMs) presents critical challenges: how to securely execute and migrate these agents across heterogeneous environments while protecting sensitive user data, maintaining availability…