English
Related papers

Related papers: Memory-efficient array redistribution through port…

200 papers

We present a new method for performing global redistributions of multidimensional arrays essential to parallel fast Fourier (or similar) transforms. Traditional methods use standard all-to-all collective communication of contiguous memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-26 Lisandro Dalcin , Mikael Mortensen , David E Keyes

In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines. This new method is computationally efficient in that we bootstrap on the master machine without over-resampling,…

Machine Learning · Statistics 2020-02-21 Yang Yu , Shih-Kang Chao , Guang Cheng

Accelerators for sparse matrix multiplication are important components in emerging systems. In this paper, we study the main challenges of accelerating Sparse Matrix Multiplication (SpMM). For the situations that data is not stored in the…

Hardware Architecture · Computer Science 2019-06-04 Pareesa Ameneh Golnari , Sharad Malik

Traditional parallel schedulers running on cluster supercomputers support only static scheduling, where the number of processors allocated to an application remains fixed throughout the execution of the job. This results in…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-06-15 Rajesh Sudarsan , Calvin J. Ribbens

This paper presents an efficient technique for matrix-vector and vector-transpose-matrix multiplication in distributed-memory parallel computing environments, where the matrices are unstructured, sparse, and have a substantially larger…

Mathematical Software · Computer Science 2018-12-04 Jonathan Eckstein , Gyorgy Matyasfalvi

High level programming languages and GPU accelerators are powerful enablers for a wide range of applications. Achieving scalable vertical (within a compute node), horizontal (across compute nodes), and temporal (over different generations…

We present four high performance hybrid sorting methods developed for various parallel platforms: shared memory multiprocessors, distributed multiprocessors, and clusters taking advantage of existence of both shared and distributed memory.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-04 Thoria Alghamdi , Gita Alaghband

Single-Program-Multiple-Data (SPMD) parallelism has recently been adopted to train large deep neural networks (DNNs). Few studies have explored its applicability on heterogeneous clusters, to fully exploit available resources for large…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-12 Shiwei Zhang , Lansong Diao , Chuan Wu , Zongyan Cao , Siyu Wang , Wei Lin

As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate…

Databases · Computer Science 2018-03-19 Weijie Zhao , Florin Rusu , Bin Dong , Kesheng Wu , Anna Y. Q. Ho , Peter Nugent

When using stochastic gradient descent to solve large-scale machine learning problems, a common practice of data processing is to shuffle the training data, partition the data across multiple machines if needed, and then perform several…

Machine Learning · Statistics 2017-10-02 Qi Meng , Wei Chen , Yue Wang , Zhi-Ming Ma , Tie-Yan Liu

Due to the significant increase in the size of spatial data, it is essential to use distributed parallel processing systems to efficiently analyze spatial data. In this paper, we first study learned spatial data partitioning, which…

Databases · Computer Science 2023-06-21 Keizo Hori , Yuya Sasaki , Daichi Amagata , Yuki Murosaki , Makoto Onizuka

We propose a novel, efficient approach for distributed sparse learning in high-dimensions, where observations are randomly partitioned across machines. Computationally, at each round our method only requires the master machine to solve a…

Machine Learning · Statistics 2016-05-26 Jialei Wang , Mladen Kolar , Nathan Srebro , Tong Zhang

The recent success of deep learning applications has coincided with those widely available powerful computational resources for training sophisticated machine learning models with huge datasets. Nonetheless, training large models such as…

Machine Learning · Computer Science 2022-01-03 Farley Lai , Asim Kadav , Erik Kruus

Distributed-memory implementations of numerical optimization algorithm, such as stochastic gradient descent (SGD), require interprocessor communication at every iteration of the algorithm. On modern distributed-memory clusters where…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-14 Aditya Devarakonda , Ramakrishnan Kannan

Scaling multi-dimensional transformers to long sequences is indispensable across various domains. However, the challenges of large memory requirements and slow speeds of such sequences necessitate sequence parallelism. All existing…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-13 Xuanlei Zhao , Shenggan Cheng , Chang Chen , Zangwei Zheng , Ziming Liu , Zheming Yang , Yang You

Distributed computing excels at processing large scale data, but the communication cost for synchronizing the shared parameters may slow down the overall performance. Fortunately, the interactions between parameter and data in many problems…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-19 Mu Li , Dave G. Andersen , Alexander J. Smola

Partitioning a graph into balanced blocks such that few edges run between blocks is a key problem for large-scale distributed processing. A current trend for partitioning huge graphs are streaming algorithms, which use low computational…

Data Structures and Algorithms · Computer Science 2022-02-02 Marcelo Fonseca Faraj , Christian Schulz

In many distributed learning problems, the heterogeneous loading of computing machines may harm the overall performance of synchronous strategies. In this paper, we propose an effective asynchronous distributed framework for the…

Machine Learning · Statistics 2017-05-23 Bikash Joshi , Franck Iutzeler , Massih-Reza Amini

Deep Neural Networks (DNNs) and Large Language Models (LLMs) have revolutionized artificial intelligence, yet their deployment faces significant memory and computational challenges, especially in resource-constrained environments.…

Hardware Architecture · Computer Science 2025-04-24 Cong Guo , Chiyue Wei , Jiaming Tang , Bowen Duan , Song Han , Hai Li , Yiran Chen

Graph-based representations underlie a wide range of scientific problems. Graph connectivity is typically represented as a sparse matrix in the Compressed Sparse Row format. Large-scale graphs rely on distributed storage, allocating…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-14 Bruno Magalhaes , Felix Schürmann
‹ Prev 1 2 3 10 Next ›