Related papers: An improved dqds algorithm

Efficient algorithms for the Hadamard decomposition

The Hadamard decomposition is a powerful technique for data analysis and matrix compression, which decomposes a given matrix into the element-wise product of two or more low-rank matrices. In this paper, we develop an efficient algorithm to…

Machine Learning · Computer Science 2025-04-23 Samuel Wertz , Arnaud Vandaele , Nicolas Gillis

Towards a more robust algorithm for computing the restricted singular value decomposition

A new algorithm to compute the restricted singular value decomposition of dense matrices is presented. Like Zha's method \cite{Zha92}, the new algorithm uses an implicit Kogbetliantz iteration, but with four major innovations. The first…

Numerical Analysis · Mathematics 2020-02-13 Ian N. Zwaan

A Mixed Precision Eigensolver Based on the Jacobi Algorithm

The classic method for computing the spectral decomposition of a real symmetric matrix, the Jacobi algorithm, can be accelerated by using mixed precision arithmetic. The Jacobi algorithm is aiming to reduce the off-diagonal entries…

Numerical Analysis · Mathematics 2025-09-03 Zhengbo Zhou

Computing accurate singular values using a mixed-precision one-sided Jacobi algorithm

We present a relative forward error analysis of a mixed-precision preconditioned one-sided Jacobi algorithm, analogous to a two-sided version introduced in [N. J. Higham, F. Tisseur, M. Webb and Z. Zhou, SIAM J. Matrix Anal. Appl. 46…

Numerical Analysis · Mathematics 2026-02-23 Zhengbo Zhou , Françoise Tisseur , Marcus Webb

DMax: Aggressive Parallel Decoding for dLLMs

We present DMax, a new paradigm for efficient diffusion language models (dLLMs). It mitigates error accumulation in parallel decoding, enabling aggressive decoding parallelism while preserving generation quality. Unlike conventional masked…

Machine Learning · Computer Science 2026-05-18 Zigeng Chen , Gongfan Fang , Xinyin Ma , Ruonan Yu , Xinchao Wang

Speeding up MadGraph5_aMC@NLO

In this paper we will describe two new optimisations implemented in MadGraph5_aMC@NLO, both of which are designed to speed-up the computation of leading-order processes (for any model). First we implement a new method to evaluate the…

High Energy Physics - Phenomenology · Physics 2021-04-26 Kiran Ostrolenk , Olivier Mattelaer

New fast divide-and-conquer algorithms for the symmetric tridiagonal eigenvalue problem

In this paper, two accelerated divide-and-conquer algorithms are proposed for the symmetric tridiagonal eigenvalue problem, which cost $O(N^2r)$ {flops} in the worst case, where $N$ is the dimension of the matrix and $r$ is a modest number…

Numerical Analysis · Computer Science 2015-10-16 Shengguo Li , Xiangke Liao , Jie Liu , Hao Jiang

Improving HISQ propagator solves using deflation

Typically, the conjugate gradient (CG) algorithm employs mixed precision and even-odd preconditioning to compute propagators for highly improved staggered quarks (HISQ). This approach suffers from critical slowing down as the light quark…

High Energy Physics - Lattice · Physics 2025-02-04 Leon Hostetler , M. A. Clark , Carleton DeTar , Steven Gottlieb , Evan Weinberg

A mixed precision Jacobi SVD algorithm

We propose a mixed precision Jacobi algorithm for computing the singular value decomposition (SVD) of a dense matrix. After appropriate preconditioning, the proposed algorithm computes the SVD in a lower precision as an initial guess, and…

Numerical Analysis · Mathematics 2025-05-12 Weiguo Gao , Yuxin Ma , Meiyue Shao

Adaptive multiplication of rank-structured matrices in linear complexity

Hierarchical matrices approximate a given matrix by a decomposition into low-rank submatrices that can be handled efficiently in factorized form. $\mathcal{H}^2$-matrices refine this representation following the ideas of fast multipole…

Numerical Analysis · Mathematics 2024-04-24 Steffen Börm

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution

Model quantization is challenging due to many tedious hyper-parameters such as precision (bitwidth), dynamic range (minimum and maximum discrete values) and stepsize (interval between discrete values). Unlike prior arts that carefully tune…

Machine Learning · Computer Science 2021-07-08 Zhang Zhaoyang , Shao Wenqi , Gu Jinwei , Wang Xiaogang , Luo Ping

Real dqds for the nonsymmetric tridiagonal eigenvalue problem

We present a new transform, triple dqds, to help to compute the eigenvalues of a real tridiagonal matrix C using real arithmetic. The algorithm uses the real dqds transform to shift by a real number and triple dqds to shift by a complex…

Numerical Analysis · Mathematics 2012-01-25 Carla Ferreira , Beresford Parlett

DFacTo: Distributed Factorization of Tensors

We present a technique for significantly speeding up Alternating Least Squares (ALS) and Gradient Descent (GD), two widely used algorithms for tensor factorization. By exploiting properties of the Khatri-Rao product, we show how to…

Machine Learning · Statistics 2014-06-19 Joon Hee Choi , S. V. N. Vishwanathan

Distributed stochastic optimization with large delays

One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent…

Optimization and Control · Mathematics 2021-07-08 Zhengyuan Zhou , Panayotis Mertikopoulos , Nicholas Bambos , Peter W. Glynn , Yinyu Ye

Near Term Algorithms for Linear Systems of Equations

Finding solutions to systems of linear equations is a common prob\-lem in many areas of science and engineering, with much potential for a speedup on quantum devices. While the Harrow-Hassidim-Lloyd (HHL) quantum algorithm yields up to an…

Quantum Physics · Physics 2023-07-20 Aidan Pellow-Jarman , Ilya Sinayskiy , Anban Pillay , Francesco Petruccione

Distributed Variational Quantum Linear Solver

The Variational Quantum Linear Solver (VQLS), a hybrid quantum-classical algorithm for solving linear systems, faces a practical scalability bottleneck: the Linear Combination of Unitaries (LCU) decomposition requires O(L^2) circuit…

Quantum Physics · Physics 2026-04-17 Chao Lu , Pooja Rao , Muralikrishnan Gopalakrishnan Meena , Kalyana Chakaravarthi Gottiparthi

VecHGrad for Solving Accurately Complex Tensor Decomposition

Tensor decomposition, a collection of factorization techniques for multidimensional arrays, are among the most general and powerful tools for scientific analysis. However, because of their increasing size, today's data sets require more…

Machine Learning · Computer Science 2020-03-11 Jeremy Charlier , Vladimir Makarenkov

An Alternative Graphical Lasso Algorithm for Precision Matrices

The Graphical Lasso (GLasso) algorithm is fast and widely used for estimating sparse precision matrices (Friedman et al., 2008). Its central role in the literature of high-dimensional covariance estimation rivals that of Lasso regression…

Computation · Statistics 2024-03-20 Aramayis Dallakyan , Mohsen Pourahmadi

Arithmetical enhancements of the Kogbetliantz method for the SVD of order two

An enhanced Kogbetliantz method for the singular value decomposition (SVD) of general matrices of order two is proposed. The method consists of three phases: an almost exact prescaling, that can be beneficial to the LAPACK's xLASV2 routine…

Numerical Analysis · Mathematics 2026-02-10 Vedran Novaković

Delay Analysis of 5G HARQ in the Presence of Decoding and Feedback Latencies

The growing demand for stringent quality of service (QoS) guarantees in 5G networks requires accurate characterisation of delay performance, often measured using Delay Violation Probability (DVP) for a given target delay. Widely used…

Information Theory · Computer Science 2025-09-16 Vishnu N Moothedath , Sangwon Seo , Neda Petreska , Bernhard Kloiber , James Gross