English
Related papers

Related papers: Practical Parallel External Memory Algorithms via …

200 papers

Parallelization and External Memory (PEM) techniques have significantly enhanced the capabilities of search algorithms when solving large-scale problems. Previous research on PEM has primarily centered on unidirectional algorithms, with…

Artificial Intelligence · Computer Science 2025-01-06 Lior Siag , Shahaf S. Shperberg , Ariel Felner , Nathan R. Sturtevant

Since its introduction in 2004, the MapReduce framework has become one of the standard approaches in massive distributed and parallel computation. In contrast to its intensive use in practise, theoretical footing is still limited and only…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-12-19 Gero Greiner , Riko Jacob

The growth in the use of computationally intensive statistical procedures, especially with Big Data, has necessitated the usage of parallel computation on diverse platforms such as multicore, GPU, clusters and clouds. However, slowdown due…

Computation · Statistics 2014-09-23 Norman Matloff

Sparse matrix multiplication is traditionally performed in memory and scales to large matrices using the distributed memory of multiple nodes. In contrast, we scale sparse matrix multiplication beyond memory capacity by implementing sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-15 Da Zheng , Disa Mhembere , Vince Lyzinski , Joshua Vogelstein , Carey E. Priebe , Randal Burns

The bulk-synchronous parallel (BSP) model provides a framework for writing parallel programs with predictable performance. In this paper we extend the BSP model to support what we will call pseudo-streaming algorithms for accelerators. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-24 Jan-Willem Buurlage , Tom Bannink , Abe Wits

Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many…

Hardware Architecture · Computer Science 2015-11-17 James Hanlon

There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even…

Data Structures and Algorithms · Computer Science 2019-08-22 Laxman Dhulipala , Guy E. Blelloch , Julian Shun

Bulk-bitwise processing-in-memory (PIM), where large bitwise operations are performed in parallel by the memory array itself, is an emerging form of computation with the potential to mitigate the memory wall problem. This paper examines the…

Hardware Architecture · Computer Science 2023-09-29 Ben Perach , Ronny Ronen , Benny Kimelfeld , Shahar Kvatinsky

In this paper, we perform an empirical evaluation of the Parallel External Memory (PEM) model in the context of geometric problems. In particular, we implement the parallel distribution sweeping framework of Ajwani, Sitchinava and Zeh to…

Data Structures and Algorithms · Computer Science 2013-06-20 Deepak Ajwani , Nodari Sitchinava

The development of cost-effective highperformance parallel computing on multi-processor supercomputers makes it attractive to port excessively time consuming simulation software from personal computers (PC) to super computes. The power…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Ning Lu , Z. Todd Taylor , David P. Chassin , Ross T. Guttromson , R. Scott Studham

Simulation of quantum systems is challenging due to the exponential size of the state space. Tensor networks provide a systematically improvable approximation for quantum states. 2D tensor networks such as Projected Entangled Pair States…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-04 Yuchen Pang , Tianyi Hao , Annika Dugad , Yiqing Zhou , Edgar Solomonik

Applications of Binary Neural Networks (BNNs) are promising for embedded systems with hard constraints on computing power. Contrary to conventional neural networks with the floating-point datatype, BNNs use binarized weights and activations…

Emerging Technologies · Computer Science 2022-11-14 Mahdi Zahedi , Taha Shahroodi , Stephan Wong , Said Hamdioui

Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A…

Hardware Architecture · Computer Science 2023-04-04 Juan Gómez-Luna , Izzat El Hajj , Ivan Fernandez , Christina Giannoula , Geraldo F. Oliveira , Onur Mutlu

We provide algorithms for efficiently addressing quantum memory in parallel. These imply that the standard circuit model can be simulated with low overhead by the more realistic model of a distributed quantum computer. As a result, the…

The processor accelerators are effective because they are working not (completely) on principles of stored program computers. They use some kind of parallelism, and it is rather hard to program them effectively: a parallel architecture by…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-26 János Végh

The Bulk-Synchronous Parallel model of computation has been used for the architecture independent design and analysis of parallel algorithms whose performance is expressed not only in terms of problem size n but also in terms of parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-08-29 Alexandros V. Gerbessiotis , Constantinos J. Siniolakis

The bulk synchronous parallel (BSP) is a celebrated synchronization model for general-purpose parallel computing that has successfully been employed for distributed training of machine learning models. A prevalent shortcoming of the BSP is…

Machine Learning · Computer Science 2020-01-07 Xing Zhao , Manos Papagelis , Aijun An , Bao Xin Chen , Junfeng Liu , Yonggang Hu

More and more large data collections are gathered worldwide in various IT systems. Many of them possess the networked nature and need to be processed and analysed as graph structures. Due to their size they require very often usage of…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-06-04 Tomasz Kajdanowicz , Przemyslaw Kazienko , Wojciech Indyk

The sizes of GPU applications are rapidly growing. They are exhausting the compute and memory resources of a single GPU, and are demanding the move to multiple GPUs. However, the performance of these applications scales sub-linearly with…

Hardware Architecture · Computer Science 2020-08-11 Saiful A. Mojumder , Yifan Sun , Leila Delshadtehrani , Yenai Ma , Trinayan Baruah , José L. Abellán , John Kim , David Kaeli , Ajay Joshi

For the last thirty years, several Dynamic Memory Managers (DMMs) have been proposed. Such DMMs include first fit, best fit, segregated fit and buddy systems. Since the performance, memory usage and energy consumption of each DMM differs,…

Neural and Evolutionary Computing · Computer Science 2024-07-16 José L. Risco-Martín , David Atienza , J. Manuel Colmenar , Oscar Garnica
‹ Prev 1 2 3 10 Next ›