Related papers: Sequential & Parallel Algorithms for Big-Integer N…
Today's PCs can directly manipulate numbers not longer than 64 bits because the size of the CPU registers and the data-path are limited. Consequently, arithmetic operations such as addition, can only be performed on numbers of that length.…
Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many…
An integer adder for integers in the binary representation is one of the basic operations of any digital processor. For adding two integers of N bits each, the serial adder takes as many clock ticks. For achieving higher speeds, parallel…
Current algorithms for large-scale industrial optimization problems typically face a trade-off: they either require exponential time to reach optimal solutions, or employ problem-specific heuristics. To overcome these limitations, we…
Exactly solving multi-objective integer programming (MOIP) problems is often a very time consuming process, especially for large and complex problems. Parallel computing has the potential to significantly reduce the time taken to solve such…
In this paper we describe a parallel Gaussian elimination algorithm for matrices with entries in a finite field. Unlike previous approaches, our algorithm subdivides a very large input matrix into smaller submatrices by subdividing both…
Reduction operations are extensively employed in many computational problems. A reduction consists of, given a finite set of numeric elements, combining into a single value all elements in that set, using for this a combiner function. A…
In our study we implemented and compared seven sequential and parallel sorting algorithms: bitonic sort, multistep bitonic sort, adaptive bitonic sort, merge sort, quicksort, radix sort and sample sort. Sequential algorithms were…
Parallelization techniques have become ubiquitous for accelerating inference and training of deep neural networks. Despite this, several operations are still performed in a sequential manner. For instance, the forward and backward passes…
The problem of automatically clustering data is an age old problem. People have created numerous algorithms to tackle this problem. The execution time of any of this algorithm grows with the number of input points and the number of cluster…
We present a novel class of methods to compute functions of matrices or their action on vectors that are suitable for parallel programming. Solving appropriate simple linear systems of equations in parallel (or computing the inverse of…
This paper presents efficient algorithms, designed to leverage SIMD for performing Montgomery reductions and additions on integers larger than 512 bits. The existing algorithms encounter inefficiencies when parallelized using SIMD due to…
Many parallel algorithms which solve basic problems in computer science use auxiliary space linear in the input to facilitate conflict-free computation. There has been significant work on improving these parallel algorithms to be in-place,…
We consider the problem of sampling $n$ numbers from the range $\{1,\ldots,N\}$ without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and…
Sorting is one of the fundamental problems in computer science. Playing a role in many processes, it has a lower complexity bound imposed by $\mathcal{O}(n\log{n})$ when executing on a sequential machine. This limit can be brought down to…
To obtain a better understanding of the trade-offs between various objectives, Bi-Objective Integer Programming (BOIP) algorithms calculate the set of all non-dominated vectors and present these as the solution to a BOIP problem.…
Linear-scaling electronic-structure techniques, also called O(N) techniques, rely heavily on the multiplication of sparse matrices, where the sparsity arises from spatial cut-offs. In order to treat very large systems, the calculations must…
Residue arithmetic is an elegant and convenient way of computing with integers that exceed the natural word size of a computer. The algorithms are highly parallel and hence naturally adapted to quantum computation. The process differs from…
By supporting the access of multiple memory words at the same time, Bit-line Computing (BC) architectures allow the parallel execution of bit-wise operations in-memory. At the array periphery, arithmetic operations are then derived with…
Finding the number of triangles in a network is an important problem in the analysis of complex networks. The number of triangles also has important applications in data mining. Existing distributed memory parallel algorithms for counting…