Related papers: Making Array-Based Translation Practical for Moder…

Distributed Caching for Complex Querying of Raw Arrays

As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate…

Databases · Computer Science 2018-03-19 Weijie Zhao , Florin Rusu , Bin Dong , Kesheng Wu , Anna Y. Q. Ho , Peter Nugent

MV-PBT: Multi-Version Index for Large Datasets and HTAP Workloads

Modern mixed (HTAP) workloads execute fast update-transactions and long-running analytical queries on the same dataset and system. In multi-version (MVCC) systems, such workloads result in many short-lived versions and long version-chains…

Databases · Computer Science 2019-10-18 Christian Riegger , Tobias Vincon , Robert Gottstein , Ilia Petrov

Kairos: Efficient Temporal Graph Analytics on a Single Machine

Many important societal problems are naturally modeled as algorithms over temporal graphs. To date, however, most graph processing systems remain inefficient as they rely on distributed processing even for graphs that fit well within a…

Databases · Computer Science 2024-01-08 Joana M. F. da Trindade , Julian Shun , Samuel Madden , Nesime Tatbul

ArrayBridge: Interweaving declarative array processing with high-performance computing

Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and…

Databases · Computer Science 2017-02-28 Haoyuan Xing , Sofoklis Floratos , Spyros Blanas , Suren Byna , Prabhat , Kesheng Wu , Paul Brown

CALICO: Conversational Agent Localization via Synthetic Data Generation

We present CALICO, a method to fine-tune Large Language Models (LLMs) to localize conversational agent training data from one language to another. For slots (named entities), CALICO supports three operations: verbatim copy, literal…

Computation and Language · Computer Science 2024-12-10 Andy Rosenbaum , Pegah Kharazmi , Ershad Banijamali , Lu Zeng , Christopher DiPersio , Pan Wei , Gokmen Oz , Clement Chung , Karolina Owczarzak , Fabian Triefenbach , Wael Hamza

Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats

The proliferation of modern data processing tools has given rise to open-source columnar data formats. The advantage of these formats is that they help organizations avoid repeatedly converting data to a new format for each application.…

Databases · Computer Science 2020-05-01 Tianyu Li , Matthew Butrovich , Amadou Ngom , Wan Shen Lim , Wes McKinney , Andrew Pavlo

Array-Based Monte Carlo Tree Search

Monte Carlo Tree Search is a popular method for solving decision making problems. Faster implementations allow for more simulations within the same wall clock time, directly improving search performance. To this end, we present an…

Artificial Intelligence · Computer Science 2025-08-29 James Ragan , Fred Y. Hadaegh , Soon-Jo Chung

Hippo: A Fast, yet Scalable, Database Indexing Approach

Even though existing database indexes (e.g., B+-Tree) speed up the query execution, they suffer from two main drawbacks: (1) A database index usually yields 5% to 15% additional storage overhead which results in non-ignorable dollar cost in…

Databases · Computer Science 2016-04-13 Jia Yu , Mohamed Sarwat

TLB and Pagewalk Performance in Multicore Architectures with Large Die-Stacked DRAM Cache

In this work we study the overheads of virtual-to-physical address translation in processor architectures, like x86-64, that implement paged virtual memory using a radix tree which are walked in hardware. Translation Lookaside Buffers are…

Hardware Architecture · Computer Science 2020-02-05 Adarsh Patil

Garbage Collection Techniques for Flash-Resident Page-Mapping FTLs

Storage devices based on flash memory have replaced hard disk drives (HDDs) due to their superior performance, increasing density, and lower power consumption. Unfortunately, flash memory is subject to challenging idiosyncrasies like…

Databases · Computer Science 2015-04-08 Niv Dayan , Philippe Bonnet

Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment

High-dimensional vector similarity search (HVSS) is gaining prominence as a powerful tool for various data science and AI applications. As vector data scales up, in-memory indexes pose a significant challenge due to the substantial increase…

Databases · Computer Science 2024-03-05 Mengzhao Wang , Weizhi Xu , Xiaomeng Yi , Songlin Wu , Zhangyang Peng , Xiangyu Ke , Yunjun Gao , Xiaoliang Xu , Rentong Guo , Charles Xie

OceanBase Bacchus: a High-Performance and Scalable Cloud-Native Shared Storage Architecture for Multi-Cloud

Although an increasing number of databases now embrace shared-storage architectures, current storage-disaggregated systems have yet to strike an optimal balance between cost and performance. In high-concurrency read/write scenarios,…

Databases · Computer Science 2026-03-02 Quanqing Xu , Mingqiang Zhuang , Chuanhui Yang , Quanwei Wan , Fusheng Han , Fanyu Kong , Hao Liu , Hu Xu , Junyu Ye

Control Flow Duplication for Columnar Arrays in a Dynamic Compiler

Columnar databases are an established way to speed up online analytical processing (OLAP) queries. Nowadays, data processing (e.g., storage, visualization, and analytics) is often performed at the programming language level, hence it is…

Programming Languages · Computer Science 2023-02-21 Sebastian Kloibhofer , Lukas Makor , David Leopoldseder , Daniele Bonetta , Lukas Stadler , Hanspeter Mössenböck

High Throughput Push Based Storage Manager

The storage manager, as a key component of the database system, is responsible for organizing, reading, and delivering data to the execution engine for processing. According to the data serving mechanism, existing storage managers are…

Databases · Computer Science 2019-05-20 Ye Zhu

CARGO : Context Augmented Critical Region Offload for Network-bound datacenter Workloads

Network bound applications, like a database server executing OLTP queries or a caching server storing objects for a dynamic web applications, are essential services that consumers and businesses use daily. These services run on a large…

Hardware Architecture · Computer Science 2020-08-18 Siddharth Rai , Trevor E. Carlson

Cuckoo++ Hash Tables: High-Performance Hash Tables for Networking Applications

Hash tables are an essential data-structure for numerous networking applications (e.g., connection tracking, firewalls, network address translators). Among these, cuckoo hash tables provide excellent performance by allowing lookups to be…

Networking and Internet Architecture · Computer Science 2017-12-29 Nicolas Le Scouarnec

Spinning Fast Iterative Data Flows

Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature of many analysis and machine learning algorithms, however, is still a challenge for current systems. While certain types of bulk…

Databases · Computer Science 2012-08-02 Stephan Ewen , Kostas Tzoumas , Moritz Kaufmann , Volker Markl

DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management

The rapid adoption of large language models (LLMs) is pushing AI accelerators toward increasingly powerful and specialized designs. Instead of further complicating software development with deeply hierarchical scratchpad memories (SPMs) and…

Hardware Architecture · Computer Science 2025-12-09 Zhongchun Zhou , Chengtao Lai , Yuhang Gu , Wei Zhang

Towards Adaptive Storage Views in Virtual Memory

Traditionally, DBMSs separate their storage layer from their indexing layer. While the storage layer physically materializes the database and provides low-level access methods to it, the indexing layer on top enables a faster locating of…

Databases · Computer Science 2022-12-07 Felix Schuhknecht , Justus Henneberg

VELOC: VEry Low Overhead Checkpointing in the Age of Exascale

Checkpointing large amounts of related data concurrently to stable storage is a common I/O pattern of many HPC applications. However, such a pattern frequently leads to I/O bottlenecks that lead to poor scalability and performance. As…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-04 Bogdan Nicolae , Adam Moody , Gregory Kosinovsky , Kathryn Mohror , Franck Cappello