Related papers: A New Modular Division Algorithm and Applications
This paper presents a novel meta algorithm, Partition-Merge (PM), which takes existing centralized algorithms for graph computation and makes them distributed and faster. In a nutshell, PM divides the graph into small subgraphs using our…
An algebraic number $\beta \in \mathbb{C}$ with no conjugate of modulus 1 can serve as the base of a numeration system $(\beta, \mathcal{A})$ with parallel addition, i.e., the sum of two operands represented in base $\beta$ with digits from…
We consider numeration systems where digits are integers and the base is an algebraic number $\beta$ such that $|\beta|>1$ and $\beta$ satisfies a polynomial where one coefficient is dominant in a certain sense. For this class of bases…
We propose a new algorithm for multiplying dense polynomials with integer coefficients in a parallel fashion, targeting multi-core processor architectures. Complexity estimates and experimental comparisons demonstrate the advantages of this…
In modern computing units, division operations are generally slower than other arithmetic operations and require more resources, such as area and power, than multiplication. To reduce the delay, fast division algorithms use an initial…
A new variant of bit interleaved coded modulation (BICM) is proposed. In the new scheme, called Parallel BICM, L identical binary codes are used in parallel using a mapper, a newly proposed finite-length interleaver and a binary dither…
This paper presents a novel algorithm for the modulus operation for FPGA implementation. The proposed algorithm use only addition, subtraction, logical, and bit shift operations, avoiding the complexities and hardware costs associated with…
Computations over the rational numbers often encounter the problem of intermediate coefficient growth. A solution to this is provided by modular methods, which apply the algorithm under consideration modulo a number of primes and then lift…
An integer adder for integers in the binary representation is one of the basic operations of any digital processor. For adding two integers of N bits each, the serial adder takes as many clock ticks. For achieving higher speeds, parallel…
In view of the existing limitations of sequential computing, parallelization has emerged as an alternative in order to improve the speedup of numerical simulations. In the framework of evolutionary problems, space-time parallel methods…
The implicit 2D3V particle-in-cell (PIC) code developed to study the interaction of ultrashort pulse lasers with matter [G. M. Petrov and J. Davis, Computer Phys. Comm. 179, 868 (2008); Phys. Plasmas 18, 073102 (2011)] has been parallelized…
Residue Number Systems (RNS) offer efficient modular arithmetic and natural parallelism, but direct integer division in RNS remains a difficult and comparatively underdeveloped operation. This paper builds on the type-II division algorithm…
Alternating Direction Method of Multipliers (ADMM) algorithm has been widely adopted for solving the distributed optimization problem (DOP). In this paper, a new distributed parallel ADMM algorithm is proposed, which allows the agents to…
We present a novel right-to-left long division algorithm based on the Montgomery modular multiply, consisting of separate highly efficient loops with simply carry structure for computing first the remainder (x mod q) and then the quotient…
In this paper, we present several improvements in the parallelization of the in-place merge algorithm, which merges two contiguous sorted arrays into one with an O(T) space complexity (where T is the number of threads). The approach divides…
We present a new adaptive parallel algorithm for the challenging problem of multi-dimensional numerical integration on massively parallel architectures. Adaptive algorithms have demonstrated the best performance, but efficient many-core…
A mathematical characterization of serially-pruned permutations (SPPs) employed in variable-length permuters and their associated fast pruning algorithms and architectures are proposed. Permuters are used in many signal processing systems…
An efficient numerical algorithm is presented for massively parallel simulations of dispersion-managed wavelength-division-multiplexed optical fiber systems. The algorithm is based on a weak nonlinearity approximation and independent…
In this paper, we propose a new framework for designing fast parallel algorithms for fundamental statistical subset selection tasks that include feature selection and experimental design. Such tasks are known to be weakly submodular and are…
We recently derived a very accurate and fast new algorithm for numerically inverting the Laplace transforms needed to obtain gluon distributions from the proton structure function $F_2^{\gamma p}(x,Q^2)$. We numerically inverted the…