Related papers: Distributed and heterogeneous tensor-vector contra…
The Tucker decomposition generalizes the notion of Singular Value Decomposition (SVD) to tensors, the higher dimensional analogues of matrices. We study the problem of constructing the Tucker decomposition of sparse tensors on distributed…
We propose an algorithm that aims at minimizing the inter-node communication volume for distributed and memory-efficient tensor contraction schemes on modern multi-core compute nodes. The key idea is to define processor grids that optimize…
The Tucker decomposition expresses a given tensor as the product of a small core tensor and a set of factor matrices. Apart from providing data compression, the construction is useful in performing analysis such as principal component…
Best rank-one approximation is one of the most fundamental tasks in tensor computation. In order to fully exploit modern multi-core parallel computers, it is necessary to develop decoupling algorithms for computing the best rank-one…
Higher-order singular value decomposition (HOSVD) is a celebrated tool for tensor data analysis. The sequential HOSVD was recently generalized to the quaternion domain, while a naive quaternion extension of the classical HOSVD% by De…
In this paper we propose a parallel coordinate descent algorithm for solving smooth convex optimization problems with separable constraints that may arise e.g. in distributed model predictive control (MPC) for linear network systems. Our…
Big data analysis has become a crucial part of new emerging technologies such as the internet of things, cyber-physical analysis, deep learning, anomaly detection, etc. Among many other techniques, dimensionality reduction plays a key role…
This article provides next step towards solving speed bottleneck of any system that intensively uses convolutions operations (e.g. CNN). Method described in the article is applied on deformable part models (DPM) algorithm. Method described…
Improving the computational efficiency of quantum many-body calculations from a hardware perspective remains a critical challenge. Although field-programmable gate arrays (FPGAs) have recently been exploited to improve the computational…
Current high-performance computer systems used for scientific computing typically combine shared memory computational nodes in a distributed memory environment. Extracting high performance from these complex systems requires tailored…
Distributed training of deep neural networks has received significant research interest, and its major approaches include implementations on multiple GPUs and clusters. Parallelization can dramatically improve the efficiency of training…
We introduce a data distribution scheme for $\mathcal{H}$-matrices and a distributed-memory algorithm for $\mathcal{H}$-matrix-vector multiplication. Our data distribution scheme avoids an expensive $\Omega(P^2)$ scheduling procedure used…
With data sizes constantly expanding, and with classical machine learning algorithms that analyze such data requiring larger and larger amounts of computation time and storage space, the need to distribute computation and memory…
Motivated by machine learning applications in networks of sensors, internet-of-things (IoT) devices, and autonomous agents, we propose techniques for distributed stochastic convex learning from high-rate data streams. The setup involves a…
Deep convolutional neural networks (ConvNets) of 3-dimensional kernels allow joint modeling of spatiotemporal features. These networks have improved performance of video and volumetric image analysis, but have been limited in size due to…
In this paper, we present a distributed algorithm for solving convex, constraint-coupled, optimization problems over peer-to-peer networks. We consider a network of processors that aim to cooperatively minimize the sum of local cost…
The CP tensor decomposition is a low-rank approximation of a tensor. We present a distributed-memory parallel algorithm and implementation of an alternating optimization method for computing a CP decomposition of dense tensor data that can…
Hyperdimensional computing (HDC) is an emerging computing paradigm that represents, manipulates, and communicates data using very long random vectors (aka hypervectors). Among different hardware platforms capable of executing HDC…
Hypergraphs are a popular paradigm to represent complex real-world networks exhibiting multi-way relationships of varying sizes. Mining centrality in hypergraphs via symmetric adjacency tensors has only recently become computationally…
Higher-order tensor decompositions are analogous to the familiar Singular Value Decomposition (SVD), but they transcend the limitations of matrices (second-order tensors). SVD is a powerful tool that has achieved impressive results in…