Related papers: Parallel Integer Polynomial Multiplication
We present a highly scalable algorithm for multiplying sparse multivariate polynomials represented in a distributed format. This algo- rithm targets not only the shared memory multicore computers, but also computers clusters or specialized…
We discuss the parallelization of algorithms for solving polynomial systems symbolically by way of triangular decomposition. Algorithms for solving polynomial systems combine low-level routines for performing arithmetic operations on…
We report a detailed analysis of the optical realization [1, 3, 2, 4] of the analogue algorithm described in the first paper of this series [5] for the simultaneous factorization of an exponential number of integers. Such an analogue…
We describe a novel analogue algorithm that allows the simultaneous factorization of an exponential number of large integers with a polynomial number of experimental runs. It is the interference-induced periodicity of "factoring"…
Specialized function gradient computing hardware could greatly improve the performance of state-of-the-art optimization algorithms, e.g., based on gradient descent or conjugate gradient methods that are at the core of control, machine…
Exactly solving multi-objective integer programming (MOIP) problems is often a very time consuming process, especially for large and complex problems. Parallel computing has the potential to significantly reduce the time taken to solve such…
In this paper, we present a probabilistic algorithm to multiply two sparse polynomials almost as efficiently as two dense univariate polynomials with a result of approximately the same size. The algorithm depends on unproven heuristics that…
We present a novel class of methods to compute functions of matrices or their action on vectors that are suitable for parallel programming. Solving appropriate simple linear systems of equations in parallel (or computing the inverse of…
We present a new adaptive parallel algorithm for the challenging problem of multi-dimensional numerical integration on massively parallel architectures. Adaptive algorithms have demonstrated the best performance, but efficient many-core…
We give several new algorithms for dense polynomial multiplication based on the Kronecker substitution method. For moderately sized input polynomials, the new algorithms improve on the performance of the standard Kronecker substitution by a…
Parametric linear programming is central in polyhedral computations and in certain control applications.We propose a task-based scheme for parallelizing it, with quasi-linear speedup over large problems.
Arrival of multicore systems has enforced a new scenario in computing, the parallel and distributed algorithms are fast replacing the older sequential algorithms, with many challenges of these techniques. The distributed algorithms provide…
Many parallel algorithms use at least linear auxiliary space in the size of the input to enable computations to be done independently without conflicts. Unfortunately, this extra space can be prohibitive for memory-limited machines,…
Graph clustering has many important applications in computing, but due to growing sizes of graphs, even traditionally fast clustering methods such as spectral partitioning can be computationally expensive for real-world graphs of interest.…
The Kernel Polynomial Method (KPM) is one of the fast diagonalization methods used for simulations of quantum systems in research fields of condensed matter physics and chemistry. The algorithm has a difficulty to be parallelized on a…
A novel parallel algorithm for matrix multiplication is presented. The hyper-systolic algorithm makes use of a one-dimensional processor abstraction. The procedure can be implemented on all types of parallel systems. It can handle…
In view of the existing limitations of sequential computing, parallelization has emerged as an alternative in order to improve the speedup of numerical simulations. In the framework of evolutionary problems, space-time parallel methods…
Today's PCs can directly manipulate numbers not longer than 64 bits because the size of the CPU registers and the data-path are limited. Consequently, arithmetic operations such as addition, can only be performed on numbers of that length.…
Parallel batched data structures are designed to process synchronized batches of operations in a parallel computing model. In this paper, we propose parallel combining, a technique that implements a concurrent data structure from a parallel…
Generalized sparse matrix-matrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an…