English
Related papers

Related papers: A Hardware-oriented Algorithm for Complex-valued C…

200 papers

In calculating integral or discrete transforms, use has been made of fast algorithms for multiplying vectors by matrices whose elements are specified as values of special (Chebyshev, Legendre, Laguerre, etc.) functions. The currently…

Numerical Analysis · Mathematics 2022-08-11 Andrew V. Terekhov

Matrix multiplication is a fundamental operation in both training of neural networks and inference. To accelerate matrix multiplication, Graphical Processing Units (GPUs) provide it implemented in hardware. Due to the increased throughput…

Mathematical Software · Computer Science 2026-04-07 Faizan A. Khattak , Mantas Mikaitis

In this paper, we offer and discuss three efficient structural solutions for the hardware-oriented implementation of discrete quaternion Fourier transform basic operations with reduced implementation complexities. The first solution: a…

Data Structures and Algorithms · Computer Science 2017-03-21 Aleksandr Cariow , Galina Cariowa , Marina Chicheva

This paper presents a structural design of the hardware-efficient module for implementation of convolution neural network (CNN) basic operation with reduced implementation complexity. For this purpose we utilize some modification of the…

Signal Processing · Electrical Eng. & Systems 2018-11-09 Aleksandr Cariow , Galina Cariowa

In this work, a rationalized algorithm for calculating the quotient of two quaternions is presented which reduces the number of underlying real multiplications. Hardware for fast multiplication is much more expensive than hardware for fast…

Signal Processing · Electrical Eng. & Systems 2020-09-02 Aleksandr Cariow , Galina Cariowa

Vector-Matrix Multiplication (VMM) is the fundamental and frequently required computation in inference of Neural Networks (NN). Due to the large data movement required during inference, VMM can benefit greatly from in-memory computing.…

Hardware Architecture · Computer Science 2025-10-03 Felix Zeller , John Reuben , Dietmar Fey

Multiple Constant Multiplication (MCM) over integers is a frequent operation arising in embedded systems that require highly optimized hardware. An efficient way is to replace costly generic multiplication by bit-shifts and additions, i.e.…

Hardware Architecture · Computer Science 2022-10-11 Rémi Garcia , Anastasia Volkova

This document describes an algorithm to scale a complex vector by the reciprocal of a complex value. The algorithm computes the reciprocal of the complex value and then scales the vector by the reciprocal. Some scaling may be necessary due…

Numerical Analysis · Mathematics 2023-11-13 Weslley da Silva Pereira

Approximate computing is a promising approach to reduce the power, delay, and area in hardware design for many error-resilient applications such as machine learning (ML) and digital signal processing (DSP) systems, in which multipliers…

Hardware Architecture · Computer Science 2023-10-31 Zhen Li , Hao Zhou , Lingli Wang

Kernel matrices are crucial in many learning tasks such as support vector machines or kernel ridge regression. The kernel matrix is typically dense and large-scale. Depending on the dimension of the feature space even the computation of all…

Machine Learning · Computer Science 2023-12-04 Franziska Nestler , Martin Stoll , Theresa Wagner

In this paper, we present several resource-efficient algorithmic solutions regarding the fully parallel hardware implementation of the basic filtering operation performed in the convolutional layers of convolution neural networks. In fact,…

Signal Processing · Electrical Eng. & Systems 2020-04-14 Aleksandr Cariow , Galina Cariowa

Matrix multiplication consumes a large fraction of the time taken in many machine-learning algorithms. Thus, accelerator chips that perform matrix multiplication faster than conventional processors or even GPU's are of increasing interest.…

Data Structures and Algorithms · Computer Science 2023-07-06 Daniel Cussen , Jeffrey D. Ullman

Studies on time and memory costs of products in geometric algebra have been limited to cases where multivectors with multiple grades have only non-zero elements. This allows to design efficient algorithms for a generic purpose; however, it…

Data Structures and Algorithms · Computer Science 2020-02-27 Stephane Breuils , Vincent Nozick , Akihiro Sugimoto

In this work a rationalized algorithm for calculating the quotient of two complex numbers is presented which reduces the number of underlying real multiplications. The performing of a complex number division using the naive method takes 4…

Data Structures and Algorithms · Computer Science 2016-08-31 Aleksandr Cariow

In recent years, a new kind of accelerated hardware has gained popularity in the Artificial Intelligence (AI) and Machine Learning (ML) communities which enables extremely high-performance tensor contractions in reduced precision for deep…

Computational Physics · Physics 2024-05-01 Adela Habib , Joshua Finkelstein , Anders M. N. Niklasson

In this paper we propose a fast optimization algorithm for approximately minimizing convex quadratic functions over the intersection of affine and separable constraints (i.e., the Cartesian product of possibly nonconvex real sets). This…

Optimization and Control · Mathematics 2015-09-29 Reza Takapoui , Nicholas Moehle , Stephen Boyd , Alberto Bemporad

Recently, the demand of low-power deep-learning hardware for industrial applications has been increasing. Most existing artificial intelligence (AI) chips have evolved to rely on new chip technologies rather than on radically new hardware…

Machine Learning · Computer Science 2020-02-14 Byungik Ahn

Kernel matrix-vector product is ubiquitous in many science and engineering applications. However, a naive method requires $O(N^2)$ operations, which becomes prohibitive for large-scale problems. We introduce a parallel method that provably…

Mathematical Software · Computer Science 2021-04-30 Ruoxi Wang , Chao Chen , Jonghyun Lee , Eric Darve

Quantum-dot cellular automata (QCA) shows promise as a post silicon CMOS, low power computational technology. Nevertheless, to generalize QCA for next-generation digital devices, the ability to implement conventional programmable circuits…

Mesoscale and Nanoscale Physics · Physics 2011-10-10 Joshua D. Wood , P. Douglas Tougaw

Ootomo, Ozaki, and Yokota [Int. J. High Perform. Comput. Appl., 38 (2024), p. 297-313] have proposed a strategy to recast a floating-point matrix multiplication in terms of integer matrix products. The factors A and B are split into integer…

Numerical Analysis · Mathematics 2026-05-11 Ahmad Abdelfattah , Jack Dongarra , Massimiliano Fasi , Mantas Mikaitis , Françoise Tisseur
‹ Prev 1 2 3 10 Next ›