操作系统
Large Language Models (LLMs) face challenges for on-device inference due to high memory demands. Traditional methods to reduce memory usage often compromise performance and lack adaptability. We propose FlexInfer, an optimized offloading…
Autotuning plays a pivotal role in optimizing the performance of systems, particularly in large-scale cloud deployments. One of the main challenges in performing autotuning in the cloud arises from performance variability. We first…
The management of timing constraints in a real-time operating system (RTOS) is usually realized through a global tick counter. This counter acts as the foundational time unit for all tasks in the systems. In order to establish a connection…
Crash consistency is essential for applications that must persist data. Crash-consistency testing has been commonly applied to find crash-consistency bugs in applications. The crash-state space grows exponentially as the number of…
In this paper, we demonstrate that a server running a single latency-sensitive application can be treated as a black box to reduce energy consumption while meeting an SLA target. We find that when the mean offered load is stable, one can…
Growing application memory demands and concurrent usage are making mobile device memory scarce. When memory pressure is high, current mobile systems use a RAM-based compressed swap scheme (called ZRAM) to compress unused execution-related…
The emergence of symmetric multi-processing (SMP) systems with non-uniform memory access (NUMA) has prompted extensive research on process and data placement to mitigate the performance impact of NUMA on applications. However, existing…
Achieving low remote memory access latency remains the primary challenge in realizing memory disaggregation over Ethernet within the datacenters. We present EDM that attempts to overcome this challenge using two key ideas. First, while…
File systems play an essential role in modern society for managing precious data. To meet diverse needs, they often support many configuration parameters. Such flexibility comes at the price of additional complexity which can lead to subtle…
Instruction set architectures are complex, with hundreds of registers and instructions that can modify dozens of them during execution, variably on each instance. Prose-style ISA specifications struggle to capture these intricacies of the…
The page cache is a central part of an OS. It reduces repeated accesses to storage by deciding which pages to retain in memory. As a result, the page cache has a significant impact on the performance of many applications. However, its…
As multicore hardware is becoming increasingly common in real-time systems, traditional scheduling techniques that assume a single worst-case execution time for a task are no longer adequate, since they ignore the impact of shared resources…
Dynamic linking is the standard mechanism for using external dependencies since it enables code reuse, streamlines software updates, and reduces disk/network use. Dynamic linking waits until runtime to calculate an application's relocation…
Unlike non-volatile memory that resides on the processor memory bus, memory-semantic solid-state drives (SSDs) support both byte and block access granularity via PCIe or CXL interconnects. They provide scalable memory capacity using NAND…
With the advent of hundreds of cores on a chip to accelerate applications, the operating system (OS) needs to exploit the existing parallelism provided by the underlying hardware resources to determine the right amount of processes to be…
This paper reports our experience of providing lightweight correctness guarantees to an open-source Rust OS, Theseus. First, we report new developments in intralingual design that leverage Rust's type system to enforce additional invariants…
Caching is widely used in industry to improve application performance by reducing data-access latency and taking the load off the backend infrastructure. TTLs have become the de-facto mechanism used to keep cached data reasonably fresh…
Real-time operating systems employ spatial and temporal isolation to guarantee predictability and schedulability of real-time systems on multi-core processors. Any unbounded and uncontrolled cross-core performance interference poses a…
High-Performance Computing (HPC) and Artificial Intelligence (AI) workloads typically demand substantial memory bandwidth and, to a degree, memory capacity. CXL memory expansion modules, also known as CXL "type-3" devices, enable…
In the realm of computer systems, efficient utilisation of the CPU (Central Processing Unit) has always been a paramount concern. Researchers and engineers have long sought ways to optimise process execution on the CPU, leading to the…