Related papers: Compressed Modular Matrix Multiplication
We present algorithms to perform modular polynomial multiplication or modular dot product efficiently in a single machine word. We pack polynomials into integers and perform several modular operations with machine integer or floating point…
A compression algorithm is presented that uses the set of prime numbers. Sequences of numbers are correlated with the prime numbers, and labeled with the integers. The algorithm can be iterated on data sets, generating factors of doubles on…
Frugal computing is becoming an important topic for environmental reasons. In this context, several techniques have been proposed to reduce the storage of scientific data by dedicated compression methods specially tailored for arrays of…
As nowadays Machine Learning (ML) techniques are generating huge data collections, the problem of how to efficiently engineer their storage and operations is becoming of paramount importance. In this article we propose a new lossless…
This article is concerned with the efficient computation of modular matrix multiplication C=AB mod p, a key kernel in computer algebra. We focus on floating-point arithmetic, which allows for using efficient matrix multiplication libraries.…
We show how one can use non-prime-power, composite moduli for computing representations of the product of two $n\times n$ matrices using only $n^{2+o(1)}$ multiplications.
This paper considers the problem of calculating the matrix multiplication of two massive matrices $\mathbf{A}$ and $\mathbf{B}$ distributedly. We provide a modulo technique that can be applied to coded distributed matrix multiplication…
In deep learning inference, model parameters are pruned and quantized to reduce the model size. Compression methods and common subexpression (CSE) elimination algorithms are applied on sparse constant matrices to deploy the models on…
We exploit the truncated singular value decomposition and the recently proposed circulant decomposition for an efficient first-order approximation of the multiplication of large dense matrices. A decomposition of each matrix into a sum of a…
The biggest cost of computing with large matrices in any modern computer is related to memory latency and bandwidth. The average latency of modern RAM reads is 150 times greater than a clock step of the processor. Throughput is a little…
Matrix-vector multiplication forms the basis of many iterative solution algorithms and as such is an important algorithm also for hierarchical matrices which are used to represent dense data in an optimized form by applying low-rank…
Recently, Ko\c{c} proposed a neat and efficient algorithm for computing \[ x = a^{-1} \pmod {p^k} \] for a prime $p$ based on the exact solution of linear equations using $p$-adic expansions. The algorithm requires only addition and right…
We present a method for storing multiple models within a single set of parameters. Models can coexist in superposition and still be retrieved individually. In experiments with neural networks, we show that a surprisingly large number of…
Modular integer arithmetic occurs in many algorithms for computer algebra, cryptography, and error correcting codes. Although recent microprocessors typically offer a wide range of highly optimized arithmetic functions, modular integer…
This paper describes a sufficiently simple modular multiplication algorithm, which uses only carry-save addition with bit inspection Boolean logic and without number comparison or carry propagation.
Directional interpolation is a fast and efficient compression technique for high-frequency Helmholtz boundary integral equations, but it requires a very large amount of storage in its original form. Algebraic recompression can significantly…
We study algorithms for the fast computation of modular inverses. Newton-Raphson iteration over $p$-adic numbers gives a recurrence relation computing modular inverse modulo $p^m$, that is logarithmic in $m$. We solve the recurrence to…
Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic. However, these vector space representations (created through large-scale…
Word-embeddings are vital components of Natural Language Processing (NLP) models and have been extensively explored. However, they consume a lot of memory which poses a challenge for edge deployment. Embedding matrices, typically, contain…
This paper describes a new accumulate-and-add multiplication algorithm. The method partitions one of the operands and re-combines the results of computations done with each of the partitions. The resulting design turns-out to be both…