English
Related papers

Related papers: Efficient Modular Arithmetic for SIMD Devices

200 papers

Two essential problems in Computer Algebra, namely polynomial factorization and polynomial greatest common divisor computation, can be efficiently solved thanks to multiple polynomial evaluations in two variables using modular arithmetic.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-27 Pierre Fortin , Ambroise Fleury , François Lemaire , Michael Monagan

Modular integer arithmetic occurs in many algorithms for computer algebra, cryptography, and error correcting codes. Although recent microprocessors typically offer a wide range of highly optimized arithmetic functions, modular integer…

Mathematical Software · Computer Science 2014-07-15 Joris van der Hoeven , Grégoire Lecerf , Guillaume Quintin

This paper presents efficient algorithms, designed to leverage SIMD for performing Montgomery reductions and additions on integers larger than 512 bits. The existing algorithms encounter inefficiencies when parallelized using SIMD due to…

Cryptography and Security · Computer Science 2023-09-01 Pengchang Ren , Reiji Suda , Vorapong Suppakitpaisarn

Elliptic curve cryptography (ECC) has emerged as the dominant public-key protocol, with NIST standardizing parameters for binary field GF(2^m) ECC systems. This work presents a hardware implementation of a Hybrid Multiplication technique…

Cryptography and Security · Computer Science 2025-06-25 Ruby Kumari , Gaurav Purohit , Abhijit Karmakar

This paper presents a novel algorithm for the modulus operation for FPGA implementation. The proposed algorithm use only addition, subtraction, logical, and bit shift operations, avoiding the complexities and hardware costs associated with…

Cryptography and Security · Computer Science 2025-01-10 W. A. Susantha Wijesinghe

Elliptic curve cryptography (ECC) is widely used in security applications such as public key cryptography (PKC) and zero-knowledge proofs (ZKP). ECC is composed of modular arithmetic, where modular multiplication takes most of the…

Hardware Architecture · Computer Science 2024-02-23 Jonathan Ku , Junyao Zhang , Haoxuan Shan , Saichand Samudrala , Jiawen Wu , Qilin Zheng , Ziru Li , JV Rajendran , Yiran Chen

Fast combinational multipliers with large bit widths can occupy significant silicon area, which also drives up power consumption. Area can be reduced through resource sharing (i.e., folding) at the expense of lower throughput, which is…

Hardware Architecture · Computer Science 2025-09-03 Ahmad Houraniah , H. Fatih Ugurdag , C. Emre Dedeagac

This paper proposes a novel set of trigonometric implementations which are 5x faster than the inbuilt C++ functions. The proposed implementation is also highly memory efficient requiring no precomputations of any kind. Benchmark comparisons…

Mathematical Software · Computer Science 2025-02-18 Nikhil Dev Goyal , Parth Arora

Montgomery modular multiplication is widely-used in public key cryptosystems (PKC) and affects the efficiency of upper systems directly. However, modulus is getting larger due to the increasing demand of security, which results in a heavy…

Cryptography and Security · Computer Science 2026-03-17 Yuxuan Zhang , Hua Guo , Chen Chen , Yewei Guan , Xiyong Zhang , Zhenyu Guan

Modular arithmetic is widely used in crytography and symbolic computation. This paper presents a vectorized Montgomery algorithm for modular multiplication, the key to fast modular arithmetic, that fully utilizes the SIMD instructions. We…

Mathematical Software · Computer Science 2016-09-06 Lingchuan Meng

Generalised matrix-matrix multiplication forms the kernel of many mathematical algorithms. A faster matrix-matrix multiply immediately benefits these algorithms. In this paper we implement efficient matrix multiplication for large matrices…

Performance · Computer Science 2019-12-11 Douglas Aberdeen , Jonathan Baxter

Large-number arithmetic, widely used in scientific computing and cryptography, has seen limited adoption of single instruction, multiple data (SIMD) parallelism on modern CPUs due to the inherent dependencies in traditional algorithms. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-27 Subhrajit Das , Abhishek Bichhawat , Yuvraj Patel

This paper describes a new accumulate-and-add multiplication algorithm. The method partitions one of the operands and re-combines the results of computations done with each of the partitions. The resulting design turns-out to be both…

Mathematical Software · Computer Science 2011-04-11 Byungchun Chung , Sandra Marcello , Amir-Pasha Mirbaha , David Naccache , Karim Sabeg

This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many…

Computation · Statistics 2015-03-13 Hua Zhou , Kenneth Lange , Marc A. Suchard

We describe a modified SIMD architecture suitable for single-chip integration of a large number of processing elements, such as 1,000 or more. Important differences from traditional SIMD designs are: a) The size of the memory per processing…

Astrophysics · Physics 2007-05-23 Junichiro Makino

Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable the full adoption of processing-using-DRAM, it is necessary to provide support for more complex…

The ever-increasing size and computational complexity of today's machine-learning algorithms pose an increasing strain on the underlying hardware. In this light, novel and dedicated architectural solutions are required to optimize energy…

Hardware Architecture · Computer Science 2022-12-20 Pengbo Yu , Alexandre Levisse , Mohit Gupta , Evenblij Timon , Giovanni Ansaloni , Francky Catthoor , David Atienza

Calculating interactions or correlations between pairs of particles is typically the most time-consuming task in particle simulation or correlation analysis. Straightforward implementations using a double loop over particle pairs have…

Computational Physics · Physics 2015-06-16 Szilárd Páll , Berk Hess

In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time.…

Information Retrieval · Computer Science 2021-02-02 Daniel Lemire , Leonid Boytsov

Since simulating quantum computers requires exponentially more classical resources, efficient algorithms are extremely helpful. We analyze algorithms that create single qubit and specific controlled qubit matrix representations of gates.…

Quantum Physics · Physics 2007-05-23 Eric Hsu
‹ Prev 1 2 3 10 Next ›