Related papers: Bidiagonalization with Parallel Tiled Algorithms
We analyse some QR decomposition algorithms, and show that the I/O complexity of the tile based algorithm is asymptotically the same as that of matrix multiplication. This algorithm, we show, performs the best when the tile size is chosen…
Collective communications are ubiquitous in parallel applications. We present two new algorithms for performing a reduction. The operation associated with our reduction needs to be associative and commutative. The two algorithms are…
The reduction of a banded matrix to bidiagonal form is a critical step in the calculation of Singular Values, a cornerstone of scientific computing and AI. Although inherently parallel, this step has traditionally been considered unsuitable…
This work deals with tailored reduced order models for bifurcating nonlinear parametric partial differential equations, where multiple coexisting solutions arise for a given parametric instance. Approaches based on proper orthogonal…
Advances in molecular "omics'" technologies have motivated new methodology for the integration of multiple sources of high-content biomedical data. However, most statistical methods for integrating multiple data matrices only consider data…
We propose efficient parallel algorithms and implementations on shared memory architectures of LU factorization over a finite field. Compared to the corresponding numerical routines, we have identified three main difficulties specific to…
In this paper we show how to combine two algorithmic techniques to obtain linear time algorithms for various optimization problems on graphs, and present a subroutine which will be useful in doing so. The first technique is iterative…
This work studies one of the parallel decision tree learning algorithms, pdsCART, designed for scalable and efficient data analysis. The method incorporates three core capabilities. First, it supports real-time learning from data streams,…
Any Boolean function corresponds with a complete full binary decision tree. This tree can in turn be represented in a maximally compact form as a direct acyclic graph where common subtrees are factored and shared, keeping only one copy of…
In this paper we propose to use model reduction techniques for speeding up the diagonalization-based parallel-in-time (ParaDIAG) preconditioner, for iteratively solving all-at-once systems from evolutionary PDEs. In particular, we use the…
Bidirectional motion planning often reduces planning time compared to its unidirectional counterparts. It requires connecting the forward and reverse search trees to form a continuous path. However, this process could fail and restart the…
We describe a parallel approximation algorithm for maximizing monotone submodular functions subject to hereditary constraints on distributed memory multiprocessors. Our work is motivated by the need to solve submodular optimization problems…
We study preconditioned gradient-based optimization methods where the preconditioning matrix has block-diagonal form. Such a structural constraint comes with the advantage that the update computation is block-separable and can be…
The current computer architecture has moved towards the multi/many-core structure. However, the algorithms in the current sequential dense numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multi/many-core…
Despite their promise, fair machine learning methods often yield Pareto-inefficient models, in which the performance of certain groups can be improved without degrading that of others. This issue arises frequently in traditional…
Rooted spanning trees (RSTs) are a core primitive in parallel graph analytics, underpinning algorithms such as biconnected components and planarity testing. On GPUs, RST construction has traditionally relied on breadth-first search (BFS)…
We present an algorithm for recovering planted solutions in two well-known models, the stochastic block model and planted constraint satisfaction problems, via a common generalization in terms of random bipartite graphs. Our algorithm…
Orthogonal Fractional Factorial Designs and in particular Orthogonal Arrays are frequently used in many fields of application, including medicine, engineering and agriculture. In this paper we present a methodology and an algorithm to find…
Binary Neural Networks (BNNs) enable efficient deep learning by saving on storage and computational costs. However, as the size of neural networks continues to grow, meeting computational requirements remains a challenge. In this work, we…
This paper proposes multiple extensions to the popular bicriterion transit routing approach -- Trip-Based Transit Routing (TBTR). Specifically, building on the premise of the HypRAPTOR algorithm, we first extend TBTR to its partitioning…