Related papers: Distributed Function Minimization in Apache Spark
In this paper we explore the performance limits of Apache Spark for machine learning applications. We begin by analyzing the characteristics of a state-of-the-art distributed machine learning algorithm implemented in Spark and compare it to…
Querying very large RDF data sets in an efficient manner requires a sophisticated distribution strategy. Several innovative solutions have recently been proposed for optimizing data distribution with predefined query workloads. This paper…
Programming systems incorporating aspects of functional programming, e.g., higher-order functions, are becoming increasingly popular for large-scale distributed programming. New frameworks such as Apache Spark leverage functional techniques…
Present day machine learning is computationally intensive and processes large amounts of data. It is implemented in a distributed fashion in order to address these scalability issues. The work is parallelized across a number of computing…
This paper proposes several approaches as baselines to compute a shared active subspace for multivariate vector-valued functions. The goal is to minimize the deviation between the function evaluations on the original space and those on the…
The purpose of this paper is to introduce two new classes of accelerated distributed proximal conjugate gradient algorithms for multi-agent constrained optimization problems; given as minimization of a function decomposed as a sum of M…
Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level,…
This paper considers a distributed stochastic non-convex optimization problem, where the nodes in a network cooperatively minimize a sum of $L$-smooth local cost functions with sparse gradients. By adaptively adjusting the stepsizes…
As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…
As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…
We consider the problem of minimizing the sum of cost functions pertaining to agents over a network whose topology is captured by a directed graph (i.e., asymmetric communication). We cast the problem into the ADMM setting, via a consensus…
This paper addresses a distributed optimization problem in a communication network where nodes are active sporadically. Each active node applies some learning method to control its action to maximize the global utility function, which is…
Optimal Transport (OT) naturally arises in many machine learning applications, yet the heavy computational burden limits its wide-spread uses. To address the scalability issue, we propose an implicit generative learning-based framework…
This paper presents a novel distributed formulation of the min-max optimization problem. Such a formulation enables enhanced flexibility among agents when optimizing their maximization variables. To address the problem, we propose two…
Distributed consensus optimization has received considerable attention in recent years; several distributed consensus-based algorithms have been proposed for (nonsmooth) convex and (smooth) nonconvex objective functions. However, the…
In this paper we consider a distributed optimization scenario in which a set of processors aims at minimizing the maximum of a collection of "separable convex functions" subject to local constraints. This set-up is motivated by peak-demand…
In this paper, we study distributed stochastic optimization to minimize a sum of smooth and strongly-convex local cost functions over a network of agents, communicating over a strongly-connected graph. Assuming that each agent has access to…
Most existing work uses dual decomposition and subgradient methods to solve Network Utility Maximization (NUM) problems in a distributed manner, which suffer from slow rate of convergence properties. This work develops an alternative…
Algorithms for computing All-Pairs Shortest-Paths (APSP) are critical building blocks underlying many practical applications. The standard sequential algorithms, such as Floyd-Warshall and Johnson, quickly become infeasible for large input…
The join operation is a fundamental building block of parallel data processing. Unfortunately, it is very resource-intensive to compute an equi-join across massive datasets. The approximate computing paradigm allows users to trade accuracy…