Related papers: Parallel Write-Efficient Algorithms and Data Struc…
Emerging non-volatile main memory (NVRAM) technologies provide byte-addressability, low idle power, and improved memory-density, and are likely to be a key component in the future memory hierarchy. However, a critical challenge in achieving…
The future of main memory appears to lie in the direction of new non-volatile memory technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of energy,…
Emerging memory technologies have a significant gap between the cost, both in time and in energy, of writing to memory versus reading from memory. In this paper we present models and algorithms that account for this difference, with a focus…
Delaunay Triangulation(DT) is one of the important geometric problems that is used in various branches of knowledge such as computer vision, terrain modeling, spatial clustering and networking. Kinetic data structures have become very…
In several emerging technologies for computer memory (main memory), the cost of reading is significantly cheaper than the cost of writing. Such asymmetry in memory costs poses a fundamentally different model from the RAM for algorithm…
The future of main memory appears to lie in the direction of new technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of latency, bandwidth, and energy.…
Several classic problems in graph processing and computational geometry are solved via incremental algorithms, which split computation into a series of small tasks acting on shared state, which gets updated progressively. While the…
In this paper we show that many sequential randomized incremental algorithms are in fact parallel. We consider algorithms for several problems including Delaunay triangulation, linear programming, closest pair, smallest enclosing disk,…
The sheer sizes of modern datasets are forcing data-structure designers to consider seriously both parallel construction and compactness. To achieve those goals we need to design a parallel algorithm with good scalability and with low…
We present a shared memory implementation of a parallel algorithm, called delta-stepping, for solving the single source shortest path problem for directed and undirected graphs. In order to reduce synchronization costs we make some…
As deep neural networks (DNNs) become deeper, the training time increases. In this perspective, multi-GPU parallel computing has become a key tool in accelerating the training of DNNs. In this paper, we introduce a novel methodology to…
Computing fixed-radius near-neighbor graphs is an important first step for many data analysis algorithms. Near-neighbor graphs connect points that are close under some metric, endowing point clouds with a combinatorial structure. As…
Spiking neural networks excel at event-driven sensing. Yet, maintaining task-relevant context over long timescales both algorithmically and in hardware, while respecting both tight energy and memory budgets, remains a core challenge in the…
Big graphs (networks) arising in numerous application areas pose significant challenges for graph analysts as these graphs grow to billions of nodes and edges and are prohibitively large to fit in the main memory. Finding the number of…
This paper presents a new scalable parallelization scheme to generate the 3D Delaunay triangulation of a given set of points. Our first contribution is an efficient serial implementation of the incremental Delaunay insertion algorithm. A…
Arrival of multicore systems has enforced a new scenario in computing, the parallel and distributed algorithms are fast replacing the older sequential algorithms, with many challenges of these techniques. The distributed algorithms provide…
Triangle counting is a fundamental graph analytic operation that is used extensively in network science and graph mining. As the size of the graphs that needs to be analyzed continues to grow, there is a requirement in developing scalable…
Counting triangles in a graph and incident to each vertex is a fundamental and frequently considered task of graph analysis. We consider how to efficiently do this for huge graphs using massively parallel distributed-memory machines.…
We present efficient and scalable parallel algorithms for performing mathematical operations for low-rank tensors represented in the tensor train (TT) format. We consider algorithms for addition, elementwise multiplication, computing norms…
Parallel parameterized complexity theory studies how fixed-parameter tractable (fpt) problems can be solved in parallel. Previous theoretical work focused on parallel algorithms that are very fast in principle, but did not take into account…