操作系统
Bauplan is a FaaS-based lakehouse specifically built for data pipelines: its execution engine uses Apache Arrow for data passing between the nodes in the DAG. While Arrow is known as the "zero copy format", in practice, limited Linux kernel…
Autoware is an autonomous driving system implemented on Robot Operation System (ROS) 2, where an end-to-end timing guarantee is crucial to ensure safety. However, existing ROS 2 cause-effect chain models for analyzing end-to-end latency…
The Diminishing Bandwidth Problem is a long standing, previously unidentified, extensibility problem of current real-time operating systems characterized by a superficial dependency between the number of tasks in a system and the maximum…
Real-time scheduling in commoditized component-oriented real-time systems, such as ROS 2 systems on Linux, has been studied under nested scheduling: OS thread scheduling and middleware layer scheduling (e.g., ROS 2 Executor). However, by…
Concurrency is vital for our critical software to meet modern performance requirements, yet concurrency bugs are notoriously difficult to detect and reproduce. Controlled Concurrency Testing (CCT) can make bugs easier to expose by enabling…
Persistence is the first principle of big memory systems. We comprehensively analyze the vertical and horizontal extensions of existing memory hierarchy. Networks are flattening traditional storage hierarchies. We present the…
Memory tiering systems achieve memory scaling by adding multiple tiers of memory wherein different tiers have different access latencies and bandwidth. For maximum performance, frequently accessed (hot) data must be placed close to the host…
Safe kernel extensions have gained significant traction, evolving from simple packet filters to large, complex programs that customize storage, networking, and scheduling. Existing kernel extension mechanisms like eBPF rely on in-kernel…
The network interface adapter (NIC) is a critical component of a cloud server occupying a unique position. Not only is network performance vital to efficient operation of the machine, but unlike compute accelerators like GPUs, the network…
The surging demand for GPUs in datacenters for machine learning (ML) has made efficient GPU utilization crucial. However, meeting the diverse needs of ML models while optimizing resource usage is challenging. To enable transparent,…
Pooling PCIe devices across multiple hosts offers a promising solution to mitigate stranded I/O resources, enhance device utilization, address device failures, and reduce total cost of ownership. The only viable option today are PCIe…
The NP-complete combinatorial optimization task of assigning offsets to a set of buffers with known sizes and lifetimes so as to minimize total memory usage is called dynamic storage allocation (DSA). Existing DSA implementations bypass the…
Cloud-native systems increasingly rely on infrastructure services (e.g., service meshes, monitoring agents), which compete for resources with user applications, degrading performance and scalability. We propose HeteroPod, a new abstraction…
Despite its technical superiority and flexibility, Linux remains a niche OS in the consumer markets. Because fragmentation stems from diverse distributions, it lacks the standardized experience, which discourages mainstream adoption. This…
As conventional storage density reaches its physical limits, the cost of a gigabyte of storage is no longer plummeting, but rather has remained mostly flat for the past decade. Meanwhile, file sizes continue to grow, leading to ever fuller…
Wasm is gaining popularity outside the Web as a well-specified low-level binary format with ISA portability, low memory footprint and polyglot targetability, enabling efficient in-process sandboxing of untrusted code. Despite these…
Cloud platforms remain underutilized despite multiple proposals to improve their utilization (e.g., disaggregation, harvesting, and oversubscription). Our characterization of the resource utilization of virtual machines (VMs) in Azure…
Large Language Model (LLM) applications have emerged as a prominent use case for Function-as-a-Service (FaaS) due to their high computational demands and sporadic invocation patterns. However, serving LLM functions within FaaS frameworks…
Large Language Models (LLMs) face challenges for on-device inference due to high memory demands. Traditional methods to reduce memory usage often compromise performance and lack adaptability. We propose FlexInfer, an optimized offloading…
Autotuning plays a pivotal role in optimizing the performance of systems, particularly in large-scale cloud deployments. One of the main challenges in performing autotuning in the cloud arises from performance variability. We first…