Related papers: Automatic Library Generation for Modular Polynomia…
Modular arithmetic is widely used in crytography and symbolic computation. This paper presents a vectorized Montgomery algorithm for modular multiplication, the key to fast modular arithmetic, that fully utilizes the SIMD instructions. We…
We show a new algorithm and its implementation for multiplying bit-polynomials of large degrees. The algorithm is based on evaluating polynomials at a specific set comprising a natural set for evaluation with additive FFT and a high order…
We set new speed records for multiplying long polynomials over finite fields of characteristic two. Our multiplication algorithm is based on an additive FFT (Fast Fourier Transform) by Lin, Chung, and Huang in 2014 comparing to previously…
Fast algorithms for integer and polynomial multiplication play an important role in scientific computing as well as in other disciplines. In 1971, Sch{\"o}nhage and Strassen designed an algorithm that improved the multiplication time for…
The library \emph{fast\_polynomial} for Sage compiles multivariate polynomials for subsequent fast evaluation. Several evaluation schemes are handled, such as H\"orner, divide and conquer and new ones can be added easily. Notably, a new…
This paper presents a low-latency hardware accelerator for modular polynomial multiplication for lattice-based post-quantum cryptography and homomorphic encryption applications. The proposed novel modular polynomial multiplier exploits the…
Efficient high-performance libraries often expose multiple tunable parameters to provide highly optimized routines. These can range from simple loop unroll factors or vector sizes all the way to algorithmic changes, given that some…
The Discrete Fourier Transform (DFT) is essential for various applications ranging from signal processing to convolution and polynomial multiplication. The groundbreaking Fast Fourier Transform (FFT) algorithm reduces DFT time complexity…
Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, we tackle the issue of…
Polynomial multiplication is a bottleneck in most of the public-key cryptography protocols, including Elliptic-curve cryptography and several of the post-quantum cryptography algorithms presently being studied. In this paper, we present a…
Although reliable long precision floating-point arithmetic libraries such as QD and MPFR/GMP are necessary to solve ill-conditioned problems in numerical simulation, long precision BLAS-level computation such as matrix multiplication has…
GPUs are dedicated processors used for complex calculations and simulations and they can be effectively used for tropical algebra computations. Tropical algebra is based on max-plus algebra and min-plus algebra. In this paper we proposed…
We describe a methodology for designing efficient parallel and distributed scientific software. This methodology utilizes sequences of mechanizable algebra--based optimizing transformations. In this study, we apply our methodology to the…
There are a variety of choices to be made in both computer algebra systems (CASs) and satisfiability modulo theory (SMT) solvers which can impact performance without affecting mathematical correctness. Such choices are candidates for…
The $N$th power of a polynomial matrix of fixed size and degree can be computed by binary powering as fast as multiplying two polynomials of linear degree in~$N$. When Fast Fourier Transform (FFT) is available, the resulting complexity is…
The resurgence of machine learning has increased the demand for high-performance basic linear algebra subroutines (BLAS), which have long depended on libraries to achieve peak performance on commodity hardware. High-performance BLAS…
As the most central and computationally intensive component of deep neural networks, the execution efficiency of matrix multiplication directly determines the training and inference performance of models. Harnessing the parallel processing…
We study the complexity of polynomial multiplication over arbitrary fields. We present a unified approach that generalizes all known asymptotically fastest algorithms for this problem. In particular, the well-known algorithm for…
Fourier transforms are an often necessary component in many computational tasks, and can be computed efficiently through the fast Fourier transform (FFT) algorithm. However, many applications involve an underlying continuous signal, and a…
Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) to obtain portable high performance. However, many numerical algorithms require several BLAS calls in sequence, and those successive calls result in…