Related papers: A Hardware-oriented Algorithm for Complex-valued C…

An extra-components method for evaluating fast matrix-vector multiplication with special functions

In calculating integral or discrete transforms, use has been made of fast algorithms for multiplying vectors by matrices whose elements are specified as values of special (Chebyshev, Legendre, Laguerre, etc.) functions. The currently…

Numerical Analysis · Mathematics 2022-08-11 Andrew V. Terekhov

Accurate Models of NVIDIA Tensor Cores

Matrix multiplication is a fundamental operation in both training of neural networks and inference. To accelerate matrix multiplication, Graphical Processing Units (GPUs) provide it implemented in hardware. Due to the increased throughput…

Mathematical Software · Computer Science 2026-04-07 Faizan A. Khattak , Mantas Mikaitis

Hardware-Efficient Schemes of Quaternion Multiplying Units for 2D Discrete Quaternion Fourier Transform Processors

In this paper, we offer and discuss three efficient structural solutions for the hardware-oriented implementation of discrete quaternion Fourier transform basic operations with reduced implementation complexities. The first solution: a…

Data Structures and Algorithms · Computer Science 2017-03-21 Aleksandr Cariow , Galina Cariowa , Marina Chicheva

Hardware-Efficient Structure of the Accelerating Module for Implementation of Convolutional Neural Network Basic Operation

This paper presents a structural design of the hardware-efficient module for implementation of convolution neural network (CNN) basic operation with reduced implementation complexity. For this purpose we utilize some modification of the…

Signal Processing · Electrical Eng. & Systems 2018-11-09 Aleksandr Cariow , Galina Cariowa

An algorithm for dividing quaternions

In this work, a rationalized algorithm for calculating the quotient of two quaternions is presented which reduces the number of underlying real multiplications. Hardware for fast multiplication is much more expensive than hardware for fast…

Signal Processing · Electrical Eng. & Systems 2020-09-02 Aleksandr Cariow , Galina Cariowa

Multiplier-free In-Memory Vector-Matrix Multiplication Using Distributed Arithmetic

Vector-Matrix Multiplication (VMM) is the fundamental and frequently required computation in inference of Neural Networks (NN). Due to the large data movement required during inference, VMM can benefit greatly from in-memory computing.…

Hardware Architecture · Computer Science 2025-10-03 Felix Zeller , John Reuben , Dietmar Fey

Towards the Multiple Constant Multiplication at Minimal Hardware Cost

Multiple Constant Multiplication (MCM) over integers is a frequent operation arising in embedded systems that require highly optimized hardware. An efficient way is to replace costly generic multiplication by bit-shifts and additions, i.e.…

Hardware Architecture · Computer Science 2022-10-11 Rémi Garcia , Anastasia Volkova

An algorithm for scaling vectors by the reciprocal of a complex number

This document describes an algorithm to scale a complex vector by the reciprocal of a complex value. The algorithm computes the reciprocal of the complex value and then scales the vector by the reciprocal. Some scaling may be necessary due…

Numerical Analysis · Mathematics 2023-11-13 Weslley da Silva Pereira

AMG: Automated Efficient Approximate Multiplier Generator for FPGAs via Bayesian Optimization

Approximate computing is a promising approach to reduce the power, delay, and area in hardware design for many error-resilient applications such as machine learning (ML) and digital signal processing (DSP) systems, in which multipliers…

Hardware Architecture · Computer Science 2023-10-31 Zhen Li , Hao Zhou , Lingli Wang

Learning in High-Dimensional Feature Spaces Using ANOVA-Based Fast Matrix-Vector Multiplication

Kernel matrices are crucial in many learning tasks such as support vector machines or kernel ridge regression. The kernel matrix is typically dense and large-scale. Depending on the dimension of the feature space even the computation of all…

Machine Learning · Computer Science 2023-12-04 Franziska Nestler , Martin Stoll , Theresa Wagner

Minimal Filtering Algorithms for Convolutional Neural Networks

In this paper, we present several resource-efficient algorithmic solutions regarding the fully parallel hardware implementation of the basic filtering operation performed in the convolutional layers of convolution neural networks. In fact,…

Signal Processing · Electrical Eng. & Systems 2020-04-14 Aleksandr Cariow , Galina Cariowa

Matrix Multiplication Using Only Addition

Matrix multiplication consumes a large fraction of the time taken in many machine-learning algorithms. Thus, accelerator chips that perform matrix multiplication faster than conventional processors or even GPU's are of increasing interest.…

Data Structures and Algorithms · Computer Science 2023-07-06 Daniel Cussen , Jeffrey D. Ullman

Computational Aspects of Geometric Algebra Products of Two Homogeneous Multivectors

Studies on time and memory costs of products in geometric algebra have been limited to cases where multivectors with multiple grades have only non-zero elements. This allows to design efficient algorithms for a generic purpose; however, it…

Data Structures and Algorithms · Computer Science 2020-02-27 Stephane Breuils , Vincent Nozick , Akihiro Sugimoto

An algorithm for dividing two complex numbers

In this work a rationalized algorithm for calculating the quotient of two complex numbers is presented which reduces the number of underlying real multiplications. The performing of a complex number division using the naive method takes 4…

Data Structures and Algorithms · Computer Science 2016-08-31 Aleksandr Cariow

Efficient Mixed-Precision Matrix Factorization of the Inverse Overlap Matrix in Electronic Structure Calculations with AI-Hardware and GPUs

In recent years, a new kind of accelerated hardware has gained popularity in the Artificial Intelligence (AI) and Machine Learning (ML) communities which enables extremely high-performance tensor contractions in reduced precision for deep…

Computational Physics · Physics 2024-05-01 Adela Habib , Joshua Finkelstein , Anders M. N. Niklasson

A Simple Effective Heuristic for Embedded Mixed-Integer Quadratic Programming

In this paper we propose a fast optimization algorithm for approximately minimizing convex quadratic functions over the intersection of affine and separable constraints (i.e., the Cartesian product of possibly nonconvex real sets). This…

Optimization and Control · Mathematics 2015-09-29 Reza Takapoui , Nicholas Moehle , Stephen Boyd , Alberto Bemporad

Near-Optimal Hardware Design for Convolutional Neural Networks

Recently, the demand of low-power deep-learning hardware for industrial applications has been increasing. Most existing artificial intelligence (AI) chips have evolved to rely on new chip technologies rather than on radically new hardware…

Machine Learning · Computer Science 2020-02-14 Byungik Ahn

PBBFMM3D: a parallel black-box algorithm for kernel matrix-vector multiplication

Kernel matrix-vector product is ubiquitous in many science and engineering applications. However, a naive method requires $O(N^2)$ operations, which becomes prohibitive for large-scale problems. We introduce a parallel method that provably…

Mathematical Software · Computer Science 2021-04-30 Ruoxi Wang , Chao Chen , Jonghyun Lee , Eric Darve

Matrix multiplication using quantum-dot cellular automata to implement conventional microelectronics

Quantum-dot cellular automata (QCA) shows promise as a post silicon CMOS, low power computational technology. Nevertheless, to generalize QCA for next-generation digital devices, the ability to implement conventional programmable circuits…

Mesoscale and Nanoscale Physics · Physics 2011-10-10 Joshua D. Wood , P. Douglas Tougaw

Analysis of Floating-Point Matrix Multiplication Computed via Integer Arithmetic

Ootomo, Ozaki, and Yokota [Int. J. High Perform. Comput. Appl., 38 (2024), p. 297-313] have proposed a strategy to recast a floating-point matrix multiplication in terms of integer matrix products. The factors A and B are split into integer…

Numerical Analysis · Mathematics 2026-05-11 Ahmad Abdelfattah , Jack Dongarra , Massimiliano Fasi , Mantas Mikaitis , Françoise Tisseur