Related papers: Parallelizing Optimal Multiple Sequence Alignment …
As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. While various parallelism strategies-data, model, sequence, and…
The Massive Parallel Computing (MPC) model gained popularity during the last decade and it is now seen as the standard model for processing large scale data. One significant shortcoming of the model is that it assumes to work on static…
There are billions of lines of sequential code inside nowadays' software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the…
Multicore has emerged as a typical architecture model since its advent and stands now as a standard. The trend is to increase the number of cores and improve the performance of the memory system. Providing an efficient multicore…
While deep learning excels in natural image and language processing, its application to high-dimensional data faces computational challenges due to the dimensionality curse. Current large-scale data tools focus on business-oriented…
It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However,…
Arrival of multicore systems has enforced a new scenario in computing, the parallel and distributed algorithms are fast replacing the older sequential algorithms, with many challenges of these techniques. The distributed algorithms provide…
This paper addresses the problem of parallelizing computations to study non-linear dynamics in large networks of non-locally coupled oscillators using heterogeneous computing resources. The proposed approach can be applied to a variety of…
This paper proposes a general formulation for temporal parallelisation of dynamic programming for optimal control problems. We derive the elements and associative operators to be able to use parallel scans to solve these problems with…
We propose new sequential sorting operations by adapting techniques and methods used for designing parallel sorting algorithms. Although the norm is to parallelize a sequential algorithm to improve performance, we adapt a contrarian…
The Simplex tableau has been broadly used and investigated in the industry and academia. With the advent of the big data era, ever larger problems are posed to be solved in ever larger machines whose architecture type did not exist in the…
Scheduling query execution plans is a particularly complex problem in shared-nothing parallel systems, where each site consists of a collection of local time-shared (e.g., CPU(s) or disk(s)) and space-shared (e.g., memory) resources and…
Dynamic programming is a powerful technique that is, unfortunately, often inherently sequential. That is, there exists no unified method to parallelize algorithms that use dynamic programming. In this paper, we attempt to address this issue…
The proliferation of multi-core and multiprocessor-based computer systems has led to explosive development of parallel applications and hence the need for efficient schedulers. In this paper, we study hierarchical scheduling for malleable…
The approximate minimum degree algorithm is widely used before numerical factorization to reduce fill-in for sparse matrices. While considerable attention has been given to the numerical factorization process, less focus has been placed on…
Many modern applications require real-time processing of large volumes of high-speed data. Such data processing needs can be modeled as a streaming computation. A streaming computation is specified as a dataflow graph that exposes multiple…
Scaling multi-dimensional transformers to long sequences is indispensable across various domains. However, the challenges of large memory requirements and slow speeds of such sequences necessitate sequence parallelism. All existing…
All-pairs similarity problem asks to find all vector pairs in a set of vectors the similarities of which surpass a given similarity threshold, and it is a computational kernel in data mining and information retrieval for several tasks. We…
Motivated by the need for adaptive, secure and responsive scheduling in a great range of computing applications, including human-centered and time-critical applications, this paper proposes a scheduling framework that seamlessly adds…
In the near future, massively parallel computing systems will be necessary to solve computation intensive applications. The key bottleneck in massively parallel implementation of numerical algorithms is the synchronization of data across…