Related papers: An efficient hybrid tridiagonal divide-and-conquer…
In this paper, a parallel structured divide-and-conquer (PSDC) eigensolver is proposed for symmetric tridiagonal matrices based on ScaLAPACK and a parallel structured matrix multiplication algorithm, called PSMMA. Computing the eigenvectors…
In this paper, two accelerated divide-and-conquer algorithms are proposed for the symmetric tridiagonal eigenvalue problem, which cost $O(N^2r)$ {flops} in the worst case, where $N$ is the dimension of the matrix and $r$ is a modest number…
We present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use…
For dense Hermitian matrices with small off-diagonal (numerical) ranks and in a hierarchically semiseparable form, we give a stable divide-and-conquer eigendecomposition method with nearly linear complexity (called SuperDC) that…
Matrix Factorization (MF) has been widely applied in machine learning and data mining. A large number of algorithms have been studied to factorize matrices. Among them, stochastic gradient descent (SGD) is a commonly used method.…
We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which…
We provide a flexible, open-source framework for hardware acceleration, namely massively-parallel execution on general-purpose graphics processing units (GPUs), applied to the hierarchical Poincar\'e--Steklov (HPS) family of algorithms for…
Divide-and-conquer-based (DC-based) evolutionary algorithms (EAs) have achieved notable success in dealing with large-scale optimization problems (LSOPs). However, the appealing performance of this type of algorithms generally requires a…
In this paper we consider the problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles. This is a common problem that arises in many agent-based simulation studies, and is of central importance in the…
As programmers turn to software-defined hardware (SDH) to maintain a high level of productivity while programming hardware to run complex algorithms, heavy-lifting must be done by the compiler to automatically partition on-chip arrays. In…
The factorization of skew-symmetric matrices is a critically understudied area of dense linear algebra, particularly in comparison to that of general and symmetric matrices. While some algorithms can be adapted from the symmetric case, the…
Heterogeneous MPSoCs comprise diverse processing units of varying compute capabilities. To date, the mapping strategies of neural networks (NNs) onto such systems are yet to exploit the full potential of processing parallelism, made…
Divide and Conquer (DC) is conceptually well suited to high-dimensional optimization by decomposing a problem into multiple small-scale sub-problems. However, appealing performance can be seldom observed when the sub-problems are…
The implementation difficulties of combining distribution matching (DM) and dematching (invDM) for probabilistic shaping (PS) with soft-decision forward error correction (FEC) coding can be relaxed by reverse concatenation, for which the…
We consider the design of efficient algorithms for a multicore computing environment with a global shared memory and p cores, each having a cache of size M, and with data organized in blocks of size B. We characterize the class of…
Various numerical methods used for solving partial differential equations (PDE) result in tridiagonal systems. Solving tridiagonal systems on distributed-memory environments is not straightforward, and often requires significant amount of…
We present a fast and memory-efficient algorithm for transient, space-time-domain, and elastodynamic boundary-integral analysis. Associated data-sparse approximations and operations are named fast domain partitioning hierarchical matrices…
The problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles appears frequently in the context of agent-based simulation studies. For this reason, the High Level Architecture (HLA) specification -- a…
This article describes a geometric partitioning software that can be used for quick computation of data partitions on many-core HPC machines. It is most suited for dynamic applications with load distributions that vary with time.…
Structured dense matrices result from boundary integral problems in electrostatics and geostatistics, and also Schur complements in sparse preconditioners such as multi-frontal methods. Exploiting the structure of such matrices can reduce…