Related papers: Parallel Finger Search Structures
In this paper we present two versions of a parallel working-set map on p processors that supports searches, insertions and deletions. In both versions, the total work of all operations when the map has size at least p is bounded by the…
This note recapitulates an algorithmic observation for ordered Depth-First Search (DFS) in directed graphs that immediately leads to a parallel algorithm with linear speed-up for a range of processors for non-sparse graphs. The note extends…
We present a work-efficient parallel level-synchronous Breadth First Search (BFS) algorithm for shared-memory architectures which achieves the theoretical lower bound on parallel running time. The optimality holds regardless of the shape of…
Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms…
In the particular case we have insertions/deletions at the tail of a given set S of $n$ one-dimensional elements, we present a simpler and more concrete algorithm than that presented in [Anderson, 2007] achieving the same (but also…
Determining the degree of inherent parallelism in classical sequential algorithms and leveraging it for fast parallel execution is a key topic in parallel computing, and detailed analyses are known for a wide range of classical algorithms.…
Hash table is a fundamental data structure for quick search and retrieval of data. It is a key component in complex graph analytics and AI/ML applications. State-of-the-art parallel hash table implementations either make some simplifying…
Graphs and their traversal is becoming significant as it is applicable to various areas of mathematics, science and technology. Various problems in fields as varied as biochemistry (genomics), electrical engineering (communication…
Breadth-first search (BFS) is a fundamental graph algorithm that presents significant challenges for parallel implementation due to irregular memory access patterns, load imbalance and synchronization overhead. In this paper, we introduce a…
To harness modern multicore processors, it is imperative to develop parallel versions of fundamental algorithms. In this paper, we compare different approaches to parallel best-first search in a shared-memory setting. We present a new…
Depth first search (DFS) tree is a fundamental data structure for solving graph problems. The classical algorithm [SiComp74] for building a DFS tree requires $O(m+n)$ time for a given graph $G$ having $n$ vertices and $m$ edges. Recently,…
We study multi-finger binary search trees (BSTs), a far-reaching extension of the classical BST model, with connections to the well-studied $k$-server problem. Finger search is a popular technique for speeding up BST operations when a query…
Constructing a Depth First Search (DFS) tree is a fundamental graph problem, whose parallel complexity is still not settled. Reif showed parallel intractability of lex-first DFS. In contrast, randomized parallel algorithms (and more…
Balanced search trees are widely used in computer science to efficiently maintain dynamic ordered data. To support efficient set operations (e.g., union, intersection, difference) using trees, the join-based framework is widely studied.…
Ordered set (and map) is one of the most used data type. In addition to standard set operations, like insert, delete and contains, it can provide set-set operations such as union, intersection, and difference. Each of these set-set…
Parallel parameterized complexity theory studies how fixed-parameter tractable (fpt) problems can be solved in parallel. Previous theoretical work focused on parallel algorithms that are very fast in principle, but did not take into account…
We investigate the limits of one of the fundamental ideas in data structures: fractional cascading. This is an important data structure technique to speed up repeated searches for the same key in multiple lists and it has numerous…
To design efficient parallel algorithms, some recent papers showed that many sequential iterative algorithms can be directly parallelized but there are still challenges in achieving work-efficiency and high-parallelism. Work-efficiency can…
We consider a parallel computational model that consists of $P$ processors, each with a fast local ephemeral memory of limited size, and sharing a large persistent memory. The model allows for each processor to fault with bounded…
Optimizing the parallel training of large models requires exploring intra-operator parallelism plans for a computation graph that typically contains tens of thousands of primitive operators. While the optimization of parallel data…