操作系统
We propose the Ratio1 AI meta-operating system (meta-OS), a decentralized MLOps protocol that unifies AI model development, deployment, and inference across heterogeneous edge devices. Its key innovation is an integrated blockchain-based…
The rapid advancements in autonomous driving have introduced increasingly complex, real-time GPU-bound tasks critical for reliable vehicle operation. However, the proprietary nature of these autonomous systems and closed-source GPU drivers…
Single-address-space operating systems have well-known lightweightness benefits that result from their central design idea: the kernel and applications share a unique address space. This model makes these operating systems (OSes)…
Enterprise SSDs integrate numerous computing resources (e.g., ARM processor and onboard DRAM) to satisfy the ever-increasing performance requirements of I/O bursts. While these resources substantially elevate the monetary costs of SSDs, the…
Despite the promise of alleviating the main memory bottleneck, and the existence of commercial hardware implementations, techniques for Near-Data Processing have seen relatively little real-world deployment. The idea has received renewed…
Protected user-level libraries have been proposed as a way to allow mutually distrusting applications to safely share kernel-bypass services. In this paper, we identify and solve several previously unaddressed obstacles to realizing this…
The proliferation of data-intensive applications, ranging from large language models to key-value stores, increasingly stresses memory systems with mixed read-write access patterns. Traditional half-duplex architectures such as DDR5 are…
The exponential growth of data-intensive applications has placed unprecedented demands on modern storage systems, necessitating dynamic and efficient optimization strategies. Traditional heuristics employed for storage performance…
Cluster orchestrators such as Kubernetes depend on accurate estimates of node capacity and job requirements. Inaccuracies in either lead to poor placement decisions and degraded cluster performance. In this paper, we show that in densely…
End-to-end imitation learning frameworks (e.g., VLA) are increasingly prominent in robotics, as they enable rapid task transfer by learning directly from perception to control, eliminating the need for complex hand-crafted features.…
Modern autonomous applications are increasingly utilizing multiple heterogeneous processors (XPUs) to accelerate different stages of algorithm modules. However, existing runtime systems for these applications, such as ROS, can only perform…
Increasing workload demands and emerging technologies necessitate the use of various memory and storage tiers in computing systems. This paper presents results from a CXL-based Experimental Memory Request Logger that reveals precise memory…
GPU singletasking is becoming increasingly inefficient and unsustainable as hardware capabilities grow and workloads diversify. We are now at an inflection point where GPUs must embrace multitasking, much like CPUs did decades ago, to meet…
LLM-based intelligent agents face significant deployment challenges, particularly related to resource management. Allowing unrestricted access to LLM or tool resources can lead to inefficient or even potentially harmful resource allocation…
Memory tiering systems seek cost-effective memory scaling by adding multiple tiers of memory. For maximum performance, frequently accessed (hot) data must be placed close to the host in faster tiers and infrequently accessed (cold) data can…
A large body of research has employed Machine Learning (ML) models to develop learned operating systems (OSes) and kernels. The latter dynamically adapts to the job load and dynamically adjusts resources (CPU, IO, memory, network bandwidth)…
As intelligent systems permeate edge devices, cloud infrastructure, and embedded real-time environments, this research proposes a new OS kernel architecture for intelligent systems, transforming kernels from static resource managers to…
In kernel-centric operations, the uprobe component of eBPF frequently encounters performance bottlenecks, largely attributed to the overheads borne by context switches. Transitioning eBPF operations to user space bypasses these hindrances,…
Detecting and resolving violations of temporal constraints in real-time systems is both, time-consuming and resource-intensive, particularly in complex software environments. Measurement-based approaches are widely used during development,…
Isolation is a critical property for shared infrastructure to limit exposure and interference among simultaneous running workloads. Cloud providers use different isolation mechanisms such as full Virtual Machines, microVMs, Linux…