Related papers: Modular Multiplication without Carry Propagation (…
Matrix multiplication consumes a large fraction of the time taken in many machine-learning algorithms. Thus, accelerator chips that perform matrix multiplication faster than conventional processors or even GPU's are of increasing interest.…
We propose to store several integers modulo a small prime into a single machine word. Modular addition is performed by addition and possibly subtraction of a word containing several times the modulo. Modular Multiplication is not directly…
This paper describes a new accumulate-and-add multiplication algorithm. The method partitions one of the operands and re-combines the results of computations done with each of the partitions. The resulting design turns-out to be both…
In this paper, we apply results on number systems based on continued fraction expansions to modular arithmetic. We provide two new algorithms in order to compute modular multiplication and modular division. The presented algorithms are…
Addition is perhaps one of the simplest arithmetic tasks one can think of and is usually performed using the carrying over algorithm. This algorithm consists of two tasks: adding digits in the same position and carrying over a one whenever…
In this paper, we derive a new computational algorithm for Barrett technique for modular polynomial multiplication, termed BA-P. BA-P is then applied to a new residue arithmetic based Barrett algorithm for modular polynomial multiplication…
If we want to represent integers in base $m$, we need a set $A$ of digits, which needs to be a complete set of residues modulo $m$. When adding two integers with last digits $a_1, a_2 \in A$, we find the unique $a \in A$ such that $a_1 +…
The new generation of machine learning processors have evolved from multi-core and parallel architectures that were designed to efficiently implement matrix-vector-multiplications (MVMs). This is because at the fundamental level, neural…
A method of determining two factors of an odd integer without need of multiplication or division operation in iterative portion of computation is presented. It is feasible for an implementing algorithm to use only integer addition and…
This paper presents a novel algorithm for the modulus operation for FPGA implementation. The proposed algorithm use only addition, subtraction, logical, and bit shift operations, avoiding the complexities and hardware costs associated with…
In this paper, we are interested in memoryless computation, a modern paradigm to compute functions which generalises the famous XOR swap algorithm to exchange the contents of two variables without using a buffer. This uses a combinatorial…
We present a non-commutative algorithm for multiplying (7x7) matrices using 250 multiplications and a non-commutative algorithm for multiplying (9x9) matrices using 520 multiplications. These algorithms are obtained using the same…
In this paper, we propose several dictionary learning algorithms for sparse representations that also impose specific structures on the learned dictionaries such that they are numerically efficient to use: reduced number of…
A modular method was suggested before to recover a band limited signal from the sample and hold and linearly interpolated (or, in general, an nth-order-hold) version of the regular samples. In this paper a novel approach for compensating…
This article describes a lightweight additive homomorphic algorithm with the same encryption and decryption keys. Compared to standard additive homomorphic algorithms like Paillier, this algorithm reduces the computational cost of…
We provide a simplified form of Primal Augmented Lagrange Multiplier algorithm. We intend to fill the gap in the steps involved in the mathematical derivations of the algorithm so that an insight into the algorithm is made. The experiment…
The technique for hardware multiplication based upon Fourier transformation has been introduced. The technique has the highest efficiency on multiplication units with up to 8 bit range. Each multiplication unit is realized on base of the…
Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. Consequently, there has been significant work on efficiently approximating matrix multiplies. We introduce a learning-based algorithm…
Additive Fourier Transform is sdudied. A fast multiplication algorithm for polynomials over the binary field is given. The bit complexity of the algorithm is $O(n(log n)(\log\log n)^2)$.
We give an $O(N\cdot \log N\cdot 2^{O(\log^*N)})$ algorithm for multiplying two $N$-bit integers that improves the $O(N\cdot \log N\cdot \log\log N)$ algorithm by Sch\"{o}nhage-Strassen. Both these algorithms use modular arithmetic.…