Related papers: Milliscale: Fast Commit on Low-Latency Object Stor…
We present BLITZCRANK, a high-speed semantic compressor designed for OLTP databases. Previous solutions are inadequate for compressing row-stores: they suffer from either low compression factor due to a coarse compression granularity or…
Previous state-of-the-art real-time object detectors have been reported on GPUs which are extremely expensive for processing massive data and in resource-restricted scenarios. Therefore, high efficiency object detectors on CPU-only devices…
The architectural shift to prefill/decode (PD) disaggregation in LLM serving improves resource utilization but struggles with the bursty nature of modern workloads. Existing autoscaling policies, often retrofitted from monolithic systems…
Arguably data is the new natural resource in the enterprise world with an unprecedented degree of proliferation. But to derive real-time actionable insights from the data, it is important to bridge the gap between managing the data that is…
Users of distributed datastores that employ quorum-based replication are burdened with the choice of a suitable client-centric consistency setting for each storage operation. The above matching choice is difficult to reason about as it…
The pursuit of power-efficiency is popularizing asymmetric multicore processors (AMP) such as ARM big.LITTLE, Apple M1 and recent Intel Alder Lake with big and little cores. However, we find that existing scalable locks fail to scale on AMP…
Existing sparse attention methods primarily target inference-time acceleration by selecting critical tokens under predefined sparsity patterns. However, they often fail to bridge the training-inference gap and lack the capacity for…
Multimodal large language models (MLLMs) have demonstrated strong general-purpose capabilities in open-world visual comprehension. However, most existing MLLMs primarily focus on holistic, scene-level understanding, often overlooking the…
This paper introduces OPTIMUM-DERAM, a highly consistent, scalable, secure, and decentralized shared memory solution. Traditional distributed shared memory implementations offer multi-object support by multi-threading a single object memory…
To meet the high-speed, low-latency, and low-complexity demand for optical interconnects, simplified maximum likelihood sequence estimation (MLSE) is proposed in this paper. Simplified MLSE combines computational simplification and reduced…
Object proposal is essential for current state-of-the-art object detection pipelines. However, the existing proposal methods generally fail in producing results with satisfying localization accuracy. The case is even worse for small objects…
In this paper, we present a novel cache design based on Multi-Level Cell Spin-Transfer Torque RAM (MLC STTRAM) that can dynamically adapt the set capacity and associativity to use efficiently the full potential of MLC STTRAM. We exploit the…
To satisfy increasing storage demands in both capacity and performance, industry has turned to multiple storage technologies, including Flash SSDs and SMR disks. These devices employ a translation layer that conceals the idiosyncrasies of…
Real-world fine manipulation, particularly in bimanual manipulation, typically requires low-latency control and stable visual localization, while collecting large-scale data is costly and limited demonstrations may lead to localization…
The computational challenges of Large Language Model (LLM) inference remain a significant barrier to their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention…
High-performance object detection relies on expensive convolutional networks to compute features, often leading to significant challenges in applications, e.g. those that require detecting objects from video streams in real time. The key to…
The growth in variety and volume of OLTP (Online Transaction Processing) applications poses a challenge to OLTP systems to meet performance and cost demands in the existing hardware landscape. These applications are highly interactive…
Compiler phase ordering has a strong effect on program performance. Finding an effective sequence of passes is still a difficult task because the search space is large and execution time, code size and energy consumption often conflict.…
Free-moving object reconstruction from monocular video remains challenging, particularly without reliable pose or depth cues and under arbitrary object motion. We introduce OnlineSplatter, a novel online feed-forward framework generating…
We present a distributed proactive caching approach that exploits user mobility information to decide where to proactively cache data to support seamless mobility, while efficiently utilizing cache storage using a congestion pricing scheme.…