Related papers: Distributed Function Minimization in Apache Spark

Understanding and Optimizing the Performance of Distributed Machine Learning Applications on Apache Spark

In this paper we explore the performance limits of Apache Spark for machine learning applications. We begin by analyzing the characteristics of a state-of-the-art distributed machine learning algorithm implemented in Spark and compare it to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-21 Celestine Dünner , Thomas Parnell , Kubilay Atasu , Manolis Sifalakis , Haralampos Pozidis

On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark

Querying very large RDF data sets in an efficient manner requires a sophisticated distribution strategy. Several innovative solutions have recently been proposed for optimizing data distribution with predefined query workloads. This paper…

Databases · Computer Science 2015-07-10 Olivier Curé , Hubert Naacke , Mohamed-Amine Baazizi , Bernd Amann

Distributed Programming via Safe Closure Passing

Programming systems incorporating aspects of functional programming, e.g., higher-order functions, are becoming increasingly popular for large-scale distributed programming. New frameworks such as Apache Spark leverage functional techniques…

Programming Languages · Computer Science 2016-02-12 Philipp Haller , Heather Miller

Modeling Scalability of Distributed Machine Learning

Present day machine learning is computationally intensive and processes large amounts of data. It is implemented in a distributed fashion in order to address these scalability issues. The work is parallelized across a number of computing…

Machine Learning · Computer Science 2017-03-28 Alexander Ulanov , Andrey Simanovsky , Manish Marwah

Shared active subspace for multivariate vector-valued functions

This paper proposes several approaches as baselines to compute a shared active subspace for multivariate vector-valued functions. The goal is to minimize the deviation between the function evaluations on the original space and those on the…

Methodology · Statistics 2024-01-08 Khadija Musayeva , Mickael Binois

Distributed accelerated proximal conjugate gradient methods for multi-agent constrained optimization problems

The purpose of this paper is to introduce two new classes of accelerated distributed proximal conjugate gradient algorithms for multi-agent constrained optimization problems; given as minimization of a function decomposed as a sum of M…

Optimization and Control · Mathematics 2024-06-21 Anteneh Getachew Gebrie

Technical Report: On the Usability of Hadoop MapReduce, Apache Spark & Apache Flink for Data Science

Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-30 Bilal Akil , Ying Zhou , Uwe Röhm

Distributed Adaptive Gradient Algorithm with Gradient Tracking for Stochastic Non-Convex Optimization

This paper considers a distributed stochastic non-convex optimization problem, where the nodes in a network cooperatively minimize a sum of $L$-smooth local cost functions with sparse gradients. By adaptively adjusting the stepsizes…

Optimization and Control · Mathematics 2024-04-01 Dongyu Han , Kun Liu , Yeming Lin , Yuanqing Xia

Optimization for Large-Scale Machine Learning with Distributed Features and Observations

As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…

Machine Learning · Statistics 2017-04-18 Alexandros Nathan , Diego Klabjan

A Stochastic Large-scale Machine Learning Algorithm for Distributed Features and Observations

As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…

Machine Learning · Statistics 2019-12-10 Biyi Fang , Diego Klabjan

Distributed optimization on directed graphs based on inexact ADMM with partial participation

We consider the problem of minimizing the sum of cost functions pertaining to agents over a network whose topology is captured by a directed graph (i.e., asymmetric communication). We cast the problem into the ADMM setting, via a consensus…

Optimization and Control · Mathematics 2023-04-04 Dingran Yi , Nikolaos M. Freris

Distributed Derivative-free Learning Method for Stochastic Optimization over a Network with Sparse Activity

This paper addresses a distributed optimization problem in a communication network where nodes are active sporadically. Each active node applies some learning method to control its action to maximize the global utility function, which is…

Optimization and Control · Mathematics 2021-04-20 Wenjie Li , Mohamad Assaad , Shiqi Zheng

On Scalable and Efficient Computation of Large Scale Optimal Transport

Optimal Transport (OT) naturally arises in many machine learning applications, yet the heavy computational burden limits its wide-spread uses. To address the scalability issue, we propose an implicit generative learning-based framework…

Machine Learning · Computer Science 2019-06-26 Yujia Xie , Minshuo Chen , Haoming Jiang , Tuo Zhao , Hongyuan Zha

Decentralized Min-Max Optimization with Gradient Tracking

This paper presents a novel distributed formulation of the min-max optimization problem. Such a formulation enables enhanced flexibility among agents when optimizing their maximization variables. To address the problem, we propose two…

Optimization and Control · Mathematics 2025-05-19 Runze You , Kun Huang , Shi Pu

Distributed Stochastic Nonsmooth Nonconvex Optimization

Distributed consensus optimization has received considerable attention in recent years; several distributed consensus-based algorithms have been proposed for (nonsmooth) convex and (smooth) nonconvex objective functions. However, the…

Optimization and Control · Mathematics 2019-11-05 Vyacheslav Kungurtsev

A duality-based approach for distributed min-max optimization with application to demand side management

In this paper we consider a distributed optimization scenario in which a set of processors aims at minimizing the maximum of a collection of "separable convex functions" subject to local constraints. This set-up is motivated by peak-demand…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-27 Ivano Notarnicola , Mauro Franceschelli , Giuseppe Notarstefano

Distributed stochastic optimization with gradient tracking over strongly-connected networks

In this paper, we study distributed stochastic optimization to minimize a sum of smooth and strongly-convex local cost functions over a network of agents, communicating over a strongly-connected graph. Assuming that each agent has access to…

Machine Learning · Computer Science 2019-04-11 Ran Xin , Anit Kumar Sahu , Usman A. Khan , Soummya Kar

A Distributed Newton Method for Network Utility Maximization

Most existing work uses dual decomposition and subgradient methods to solve Network Utility Maximization (NUM) problems in a distributed manner, which suffer from slow rate of convergence properties. This work develops an alternative…

Optimization and Control · Mathematics 2015-03-17 Ermin Wei , Asuman Ozdaglar , Ali Jadbabaie

Solving All-Pairs Shortest-Paths Problem in Large Graphs Using Apache Spark

Algorithms for computing All-Pairs Shortest-Paths (APSP) are critical building blocks underlying many practical applications. The standard sequential algorithms, such as Floyd-Warshall and Johnson, quickly become infeasible for large input…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-08 Frank Schoeneman , Jaroslaw Zola

Approximate Distributed Joins in Apache Spark

The join operation is a fundamental building block of parallel data processing. Unfortunately, it is very resource-intensive to compute an equi-join across massive datasets. The approximate computing paradigm allows users to trade accuracy…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-16 Do Le Quoc , Istemi Ekin Akkus , Pramod Bhatotia , Spyros Blanas , Ruichuan Chen , Christof Fetzer , Thorsten Strufe