Related papers: Improved Parallel Cache-Oblivious Algorithms for D…

Parallel Write-Efficient Algorithms and Data Structures for Computational Geometry

In this paper, we design parallel write-efficient geometric algorithms that perform asymptotically fewer writes than standard algorithms for the same problem. This is motivated by emerging non-volatile memory technologies with read…

Data Structures and Algorithms · Computer Science 2018-07-12 Guy E. Blelloch , Yan Gu , Yihan Sun , Julian Shun

Sage: Parallel Semi-Asymmetric Graph Algorithms for NVRAMs

Non-volatile main memory (NVRAM) technologies provide an attractive set of features for large-scale graph analytics, including byte-addressability, low idle power, and improved memory-density. NVRAM systems today have an order of magnitude…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-01 Laxman Dhulipala , Charlie McGuffey , Hongbo Kang , Yan Gu , Guy E. Blelloch , Phillip B. Gibbons , Julian Shun

Parallel algorithms in linear algebra

This report provides an introduction to algorithms for fundamental linear algebra problems on various parallel computer architectures, with the emphasis on distributed-memory MIMD machines. To illustrate the basic concepts and key issues,…

Data Structures and Algorithms · Computer Science 2015-03-17 Richard P. Brent

Implicit Decomposition for Write-Efficient Connectivity Algorithms

The future of main memory appears to lie in the direction of new technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of latency, bandwidth, and energy.…

Data Structures and Algorithms · Computer Science 2017-10-10 Naama Ben-David , Guy E. Blelloch , Jeremy T. Fineman , Phillip B. Gibbons , Yan Gu , Charles McGuffey , Julian Shun

Improving the Space-Time Efficiency of Processor-Oblivious Matrix Multiplication Algorithms

Classic cache-oblivious parallel matrix multiplication algorithms achieve optimality either in time or space, but not both, which promotes lots of research on the best possible balance or tradeoff of such algorithms. We study modern…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-14 Yuan Tang

Persistent Memory Programming Abstractions in Context of Concurrent Applications

The advent of non-volatile memory (NVM) technologies like PCM, STT, memristors and Fe-RAM is believed to enhance the system performance by getting rid of the traditional memory hierarchy by reducing the gap between memory and storage. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-12-15 Ajay Singh , Marc Shapiro , Gael Thomas

Stochastic Modeling of Hybrid Cache Systems

In recent years, there is an increasing demand of big memory systems so to perform large scale data analytics. Since DRAM memories are expensive, some researchers are suggesting to use other memory systems such as non-volatile memory (NVM)…

Performance · Computer Science 2016-10-03 Gaoying Ju , Yongkun Li , Yinlong Xu , Jiqiang Chen , John C. S. Lui

Cache-aware Performance Modeling and Prediction for Dense Linear Algebra

Countless applications cast their computational core in terms of dense linear algebra operations. These operations can usually be implemented by combining the routines offered by standard linear algebra libraries such as BLAS and LAPACK,…

Performance · Computer Science 2014-10-01 Elmar Peise , Paolo Bientinesi

Two-dimensional Sparse Parallelism for Large Scale Deep Learning Recommendation Model Training

The increasing complexity of deep learning recommendation models (DLRM) has led to a growing need for large-scale distributed systems that can efficiently train vast amounts of data. In DLRM, the sparse embedding table is a crucial…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-07 Xin Zhang , Quanyu Zhu , Liangbei Xu , Zain Huda , Wang Zhou , Jin Fang , Dennis van der Staay , Yuxi Hu , Jade Nie , Jiyan Yang , Chunzhi Yang

DGAP: Efficient Dynamic Graph Analysis on Persistent Memory

Dynamic graphs, featuring continuously updated vertices and edges, have grown in importance for numerous real-world applications. To accommodate this, graph frameworks, particularly their internal data structures, must support both…

Data Structures and Algorithms · Computer Science 2024-03-06 Abdullah Al Raqibul Islam , Dong Dai

Exploiting Inter- and Intra-Memory Asymmetries for Data Mapping in Hybrid Tiered-Memories

Modern computing systems are embracing hybrid memory comprising of DRAM and non-volatile memory (NVM) to combine the best properties of both memory technologies, achieving low latency, high reliability, and high density. A prominent…

Hardware Architecture · Computer Science 2020-05-12 Shihao Song , Anup Das , Nagarajan Kandasamy

Architecting Non-Volatile Main Memory to Guard Against Persistence-based Attacks

DRAM-based main memory and its associated components increasingly account for a significant portion of application performance bottlenecks and power budget demands inside the computing ecosystem. To alleviate the problems of storage density…

Cryptography and Security · Computer Science 2019-02-12 Fan Yao , Guru Venkataramani

Demystifying the Performance of HPC Scientific Applications on NVM-based Memory Systems

The emergence of high-density byte-addressable non-volatile memory (NVM) is promising to accelerate data- and compute-intensive applications. Current NVM technologies have lower performance than DRAM and, thus, are often paired with DRAM in…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-17 Ivy Peng , Kai Wu , Jie Ren , Dong Li , Maya Gokhale

Sorting with Asymmetric Read and Write Costs

Emerging memory technologies have a significant gap between the cost, both in time and in energy, of writing to memory versus reading from memory. In this paper we present models and algorithms that account for this difference, with a focus…

Data Structures and Algorithms · Computer Science 2016-03-15 Guy E. Blelloch , Jeremy T. Fineman , Phillip B. Gibbons , Yan Gu , Julian Shun

Massively Parallel Algorithms for Finding Well-Connected Components in Sparse Graphs

A fundamental question that shrouds the emergence of massively parallel computing (MPC) platforms is how can the additional power of the MPC paradigm be leveraged to achieve faster algorithms compared to classical parallel models such as…

Data Structures and Algorithms · Computer Science 2018-05-09 Sepehr Assadi , Xiaorui Sun , Omri Weinstein

Algorithms in the Ultra-Wide Word Model

The effective use of parallel computing resources to speed up algorithms in current multi-core parallel architectures remains a difficult challenge, with ease of programming playing a key role in the eventual success of various parallel…

Data Structures and Algorithms · Computer Science 2014-12-09 Arash Farzan , Alejandro López-Ortiz , Patrick K. Nicholson , Alejandro Salinger

Algorithmic Building Blocks for Asymmetric Memories

The future of main memory appears to lie in the direction of new non-volatile memory technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of energy,…

Data Structures and Algorithms · Computer Science 2018-06-28 Yan Gu , Yihan Sun , Guy E. Blelloch

Architecture-Aware, High Performance Transaction for Persistent Memory

Byte-addressable non-volatile main memory (NVM) demands transactional mechanisms to access and manipulate data on NVM atomically. Those transaction mechanisms often employ a logging mechanism (undo logging or redo logging). However, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-19 Kai Wu , Jie Ren , Dong Li

A Memory Controller with Row Buffer Locality Awareness for Hybrid Memory Systems

Non-volatile memory (NVM) is a class of promising scalable memory technologies that can potentially offer higher capacity than DRAM at the same cost point. Unfortunately, the access latency and energy of NVM is often higher than those of…

Hardware Architecture · Computer Science 2018-05-01 HanBin Yoon , Justin Meza , Rachata Ausavarungnirun , Rachael A. Harding , Onur Mutlu

Balanced Partitioning of Several Cache-Oblivious Algorithms

Frigo et al. proposed an ideal cache model and a recursive technique to design sequential cache-efficient algorithms in a cache-oblivious fashion. Ballard et al. pointed out that it is a fundamental open problem to extend the technique to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-04 Yuan Tang , Weiguo Gao