Related papers: An improved dqds algorithm
The Hadamard decomposition is a powerful technique for data analysis and matrix compression, which decomposes a given matrix into the element-wise product of two or more low-rank matrices. In this paper, we develop an efficient algorithm to…
A new algorithm to compute the restricted singular value decomposition of dense matrices is presented. Like Zha's method \cite{Zha92}, the new algorithm uses an implicit Kogbetliantz iteration, but with four major innovations. The first…
The classic method for computing the spectral decomposition of a real symmetric matrix, the Jacobi algorithm, can be accelerated by using mixed precision arithmetic. The Jacobi algorithm is aiming to reduce the off-diagonal entries…
We present a relative forward error analysis of a mixed-precision preconditioned one-sided Jacobi algorithm, analogous to a two-sided version introduced in [N. J. Higham, F. Tisseur, M. Webb and Z. Zhou, SIAM J. Matrix Anal. Appl. 46…
We present DMax, a new paradigm for efficient diffusion language models (dLLMs). It mitigates error accumulation in parallel decoding, enabling aggressive decoding parallelism while preserving generation quality. Unlike conventional masked…
In this paper we will describe two new optimisations implemented in MadGraph5_aMC@NLO, both of which are designed to speed-up the computation of leading-order processes (for any model). First we implement a new method to evaluate the…
In this paper, two accelerated divide-and-conquer algorithms are proposed for the symmetric tridiagonal eigenvalue problem, which cost $O(N^2r)$ {flops} in the worst case, where $N$ is the dimension of the matrix and $r$ is a modest number…
Typically, the conjugate gradient (CG) algorithm employs mixed precision and even-odd preconditioning to compute propagators for highly improved staggered quarks (HISQ). This approach suffers from critical slowing down as the light quark…
We propose a mixed precision Jacobi algorithm for computing the singular value decomposition (SVD) of a dense matrix. After appropriate preconditioning, the proposed algorithm computes the SVD in a lower precision as an initial guess, and…
Hierarchical matrices approximate a given matrix by a decomposition into low-rank submatrices that can be handled efficiently in factorized form. $\mathcal{H}^2$-matrices refine this representation following the ideas of fast multipole…
Model quantization is challenging due to many tedious hyper-parameters such as precision (bitwidth), dynamic range (minimum and maximum discrete values) and stepsize (interval between discrete values). Unlike prior arts that carefully tune…
We present a new transform, triple dqds, to help to compute the eigenvalues of a real tridiagonal matrix C using real arithmetic. The algorithm uses the real dqds transform to shift by a real number and triple dqds to shift by a complex…
We present a technique for significantly speeding up Alternating Least Squares (ALS) and Gradient Descent (GD), two widely used algorithms for tensor factorization. By exploiting properties of the Khatri-Rao product, we show how to…
One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent…
Finding solutions to systems of linear equations is a common prob\-lem in many areas of science and engineering, with much potential for a speedup on quantum devices. While the Harrow-Hassidim-Lloyd (HHL) quantum algorithm yields up to an…
The Variational Quantum Linear Solver (VQLS), a hybrid quantum-classical algorithm for solving linear systems, faces a practical scalability bottleneck: the Linear Combination of Unitaries (LCU) decomposition requires O(L^2) circuit…
Tensor decomposition, a collection of factorization techniques for multidimensional arrays, are among the most general and powerful tools for scientific analysis. However, because of their increasing size, today's data sets require more…
The Graphical Lasso (GLasso) algorithm is fast and widely used for estimating sparse precision matrices (Friedman et al., 2008). Its central role in the literature of high-dimensional covariance estimation rivals that of Lasso regression…
An enhanced Kogbetliantz method for the singular value decomposition (SVD) of general matrices of order two is proposed. The method consists of three phases: an almost exact prescaling, that can be beneficial to the LAPACK's xLASV2 routine…
The growing demand for stringent quality of service (QoS) guarantees in 5G networks requires accurate characterisation of delay performance, often measured using Delay Violation Probability (DVP) for a given target delay. Widely used…