Related papers: Optimizing Distributed Tensor Contractions using N…
The tensor-vector contraction (TVC) is the most memory-bound operation of its class and a core component of the higher-order power method (HOPM). This paper brings distributed-memory parallelization to a native TVC algorithm for dense…
Multiple Tensor-Times-Matrix (Multi-TTM) is a key computation in algorithms for computing and operating with the Tucker tensor decomposition, which is frequently used in multidimensional data analysis. We establish communication lower…
We study the scalability of consensus-based distributed optimization algorithms by considering two questions: How many processors should we use for a given problem, and how often should they communicate when communication is not free?…
Activation functions (AFs) are an important part of the design of neural networks (NNs), and their choice plays a predominant role in the performance of a NN. In this work, we are particularly interested in the estimation of flexible…
Tensor networks have proven to be a valuable tool, for instance, in the classical simulation of (strongly correlated) quantum systems. As the size of the systems increases, contracting larger tensor networks becomes computationally…
This thesis is concerned with the design of distributed algorithms for solving optimization problems. We consider networks where each node has exclusive access to a cost function, and design algorithms that make all nodes cooperate to find…
Distributed computing frameworks such as MapReduce and Spark are often used to process large-scale data computing jobs. In wireless scenarios, exchanging data among distributed nodes would seriously suffer from the communication bottleneck…
How can we capture the hidden properties from a tensor and a matrix data simultaneously in a fast, accurate, and scalable way? Coupled matrix-tensor factorization (CMTF) is a major tool to extract latent factors from a tensor and matrices…
Distributed computing has become a common practice nowadays, where the recent focus has been given to the usage of smart networking devices with in-network computing capabilities. State-of-the-art switches with near-line rate computing and…
In recent years, the paradigm of cloud computing has emerged as an architecture for computing that makes use of distributed (networked) computing resources. In this paper, we consider a distributed computing algorithmic scheme for…
Tensor factorization has been proved as an efficient unsupervised learning approach for health data analysis, especially for computational phenotyping, where the high-dimensional Electronic Health Records (EHRs) with patients' history of…
The convergence performance of distributed optimization algorithms is of significant importance to solve optimal power flow (OPF) in a distributed fashion. In this paper, we aim to provide some insights on how to partition a power system to…
We study the problem of distributed optimal resource allocation on networks with actions defined on discrete spaces, with applications to adaptive under-frequency load-shedding in power systems. In this context, the primary objective is to…
Krylov methods are a key way of solving large sparse linear systems of equations, but suffer from poor strong scalabilty on distributed memory machines. This is due to high synchronization costs from large numbers of collective…
Modern transportation network modeling increasingly involves the integration of diverse methodologies including sensor-based forecasting, reinforcement learning, classical flow optimization, and demand modeling that have traditionally been…
In this paper we consider a network with $P$ nodes, where each node has exclusive access to a local cost function. Our contribution is a communication-efficient distributed algorithm that finds a vector $x^\star$ minimizing the sum of all…
A common paradigm for scientific computing is distributed message-passing systems, and a common approach to these systems is to implement them across clusters of high-performance workstations. As multi-core architectures become increasingly…
Deploying Convolutional Neural Networks (CNNs) on resource-constrained devices necessitates efficient management of computational resources, often via distributed environments susceptible to latency from straggler nodes. This paper…
Data shuffling between distributed cluster of nodes is one of the critical steps in implementing large-scale learning algorithms. Randomly shuffling the data-set among a cluster of workers allows different nodes to obtain fresh data…
Tensor decomposition has been widely used in machine learning and high-volume data analysis. However, large-scale tensor factorization often consumes huge memory and computing cost. Meanwhile, modernized computing hardware such as tensor…