Related papers: Sequential Checking: Reallocation-Free Data-Distri…

ASURA: Scalable and Uniform Data Distribution Algorithm for Storage Clusters

Large-scale storage cluster systems need to manage a vast amount of data locations. A naive data locations management maintains pairs of data ID and nodes storing the data in tables. However, it is not practical when the number of pairs is…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-07-05 Ken-ichiro Ishikawa

Round-Hashing for Data Storage: Distributed Servers and External-Memory Tables

This paper proposes round-hashing, which is suitable for data storage on distributed servers and for implementing external-memory tables in which each lookup retrieves at most a single block of external memory, using a stash. For data…

Data Structures and Algorithms · Computer Science 2018-05-09 Roberto Grossi , Luca Versari

The End of a Myth: Distributed Transactions Can Scale

The common wisdom is that distributed transactions do not scale. But what if distributed transactions could be made scalable using the next generation of networks and a redesign of distributed databases? There would be no need for…

Databases · Computer Science 2016-11-22 Erfan Zamanian , Carsten Binnig , Tim Kraska , Tim Harris

On Approximate Sequencing Policies for Linear Storage Devices

This paper investigates sequencing policies for file reading requests in linear storage devices, such as magnetic tapes. Tapes are the technology of choice for long-term storage in data centers due to their low cost and reliability.…

Data Structures and Algorithms · Computer Science 2022-05-11 Carlos H. Cardonha , Andre A. Cire , Lucas C. Villa Real

Scalable Distributed String Sorting

String sorting is an important part of tasks such as building index data structures. Unfortunately, current string sorting algorithms do not scale to massively parallel distributed-memory machines since they either have latency (at least)…

Data Structures and Algorithms · Computer Science 2024-04-26 Florian Kurpicz , Pascal Mehnert , Peter Sanders , Matthias Schimek

Distributed storage algorithms with optimal tradeoffs

One of the primary objectives of a distributed storage system is to reliably store large amounts of source data for long durations using a large number $N$ of unreliable storage nodes, each with $c$ bits of storage capacity. Storage nodes…

Information Theory · Computer Science 2021-01-14 Michael Luby , Thomas Richardson

Cache Serializability: Reducing Inconsistency in Edge Transactions

Read-only caches are widely used in cloud infrastructures to reduce access latency and load on backend databases. Operators view coherent caches as impractical at genuinely large scale and many client-facing caches are updated in an…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-15 Ittay Eyal , Ken Birman , Robbert van Renesse

Quorum Sensing for Regenerating Codes in Distributed Storage

Distributed storage systems with replication are well known for storing large amount of data. A large number of replication is done in order to provide reliability. This makes the system expensive. Various methods have been proposed over…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-10-01 Mit Sheth , Krishna Gopal Benerjee , Manish K. Gupta

DistCache: Provable Load Balancing for Large-Scale Storage Systems with Distributed Caching

Load balancing is critical for distributed storage to meet strict service-level objectives (SLOs). It has been shown that a fast cache can guarantee load balancing for a clustered storage system. However, when the system scales out to…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-18 Zaoxing Liu , Zhihao Bai , Zhenming Liu , Xiaozhou Li , Changhoon Kim , Vladimir Braverman , Xin Jin , Ion Stoica

Accelerating Big-Data Sorting Through Programmable Switches

Sorting is a fundamental and well studied problem that has been studied extensively. Sorting plays an important role in the area of databases, as many queries can be served much faster if the relations are first sorted. One of the most…

Databases · Computer Science 2021-03-29 Yamit Barshatz-Schneor , Roy Friedman

Distributed Storage Allocations

We examine the problem of allocating a given total storage budget in a distributed storage system for maximum reliability. A source has a single data object that is to be coded and stored over a set of storage nodes; it is allowed to store…

Information Theory · Computer Science 2016-11-15 Derek Leong , Alexandros G. Dimakis , Tracey Ho

Automating Distributed Tiered Storage Management in Cluster Computing

Data-intensive platforms such as Hadoop and Spark are routinely used to process massive amounts of data residing on distributed file systems like HDFS. Increasing memory sizes and new hardware technologies (e.g., NVRAM, SSDs) have recently…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-22 Herodotos Herodotou , Elena Kakoulli

Scale-Out Processors & Energy Efficiency

Scale-out workloads like media streaming or Web search serve millions of users and operate on a massive amount of data, and hence, require enormous computational power. As the number of users is increasing and the size of data is expanding,…

Hardware Architecture · Computer Science 2018-08-16 Pouya Esmaili-Dokht , Mohammad Bakhshalipour , Behnam Khodabandeloo , Pejman Lotfi-Kamran , Hamid Sarbazi-Azad

Reliable Data Storage in Distributed Hash Tables

Distributed Hash Tables offer a resilient lookup service for unstable distributed environments. Resilient data storage, however, requires additional data replication and maintenance algorithms. These algorithms can have an impact on both…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Matthew Leslie

One Ring to Shuffle Them All: Scalable Intra-Process Data Redistribution with Ring-Buffer Shuffle in Redpanda Oxla

As server CPUs scale to dozens and now hundreds of cores per socket, parallel query engines must rethink how they redistribute data between threads. Partitioned operators such as hash joins and aggregations require frequent data…

Databases · Computer Science 2026-05-29 Adam Szymański , Tyler Akidau

Unbreakable distributed storage with quantum key distribution network and password-authenticated secret sharing

Distributed storage plays an essential role in realizing robust and secure data storage in a network over long periods of time. A distributed storage system consists of a data owner machine, multiple storage servers and channels to link…

Quantum Physics · Physics 2016-07-05 Mikio Fujiwara , Atsushi Waseda , Ryo Nojima , Shiho Moriai , Wakaha Ogata , Masahide Sasaki

On Storage Allocation for Maximum Service Rate in Distributed Storage Systems

Storage allocation affects important performance measures of distributed storage systems. Most previous studies on the storage allocation consider its effect separately either on the success of the data recovery or on the service rate…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-17 Moslem Noori , Emina Soljanin , Masoud Ardakani

Scalable Density-Based Distributed Clustering

Clustering has become an increasingly important task in analysing huge amounts of data. Traditional applications require that all data has to be located at the site where it is scrutinized. Nowadays, large amounts of heterogeneous, complex…

Databases · Computer Science 2014-09-24 Eshref Januzaj , Hans-Peter Kriegel , Martin Pfeifle

Faster Data-access in Large-scale Systems: Network-scale Latency Analysis under General Service-time Distributions

In cloud storage systems with a large number of servers, files are typically not stored in single servers. Instead, they are split, replicated (to ensure reliability in case of server malfunction) and stored in different servers. We analyze…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-09 Avishek Ghosh , Kannan Ramchandran

Parallel Weighted Random Sampling

Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory…

Data Structures and Algorithms · Computer Science 2021-07-20 Lorenz Hübschle-Schneider , Peter Sanders