Related papers: Splash: User-friendly Programming Interface for Pa…

Modeling Scalability of Distributed Machine Learning

Present day machine learning is computationally intensive and processes large amounts of data. It is implemented in a distributed fashion in order to address these scalability issues. The work is parallelized across a number of computing…

Machine Learning · Computer Science 2017-03-28 Alexander Ulanov , Andrey Simanovsky , Manish Marwah

STRETCH: Virtual Shared-Nothing Parallelism for Scalable and Elastic Stream Processing

Stream processing applications extract value from raw data through Directed Acyclic Graphs of data analysis tasks. Shared-nothing (SN) parallelism is the de-facto standard to scale stream processing applications. Given an application, SN…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-02 Vincenzo Gulisano , Hannaneh Najdataei , Yiannis Nikolakopoulos , Alessandro V. Papadopoulos , Marina Papatriantafilou , Philippas Tsigas

Swarming for Faster Convergence in Stochastic Optimization

We study a distributed framework for stochastic optimization which is inspired by models of collective motion found in nature (e.g., swarming) with mild communication requirements. Specifically, we analyze a scheme in which each one of $N >…

Optimization and Control · Mathematics 2018-08-08 Shi Pu , Alfredo Garcia

Distributed Stochastic Optimization via Adaptive SGD

Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data. Scaling up the training process of these models is crucial, but the most popular algorithm, Stochastic Gradient Descent…

Machine Learning · Statistics 2018-10-30 Ashok Cutkosky , Robert Busa-Fekete

On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark

Querying very large RDF data sets in an efficient manner requires a sophisticated distribution strategy. Several innovative solutions have recently been proposed for optimizing data distribution with predefined query workloads. This paper…

Databases · Computer Science 2015-07-10 Olivier Curé , Hubert Naacke , Mohamed-Amine Baazizi , Bernd Amann

Parallel Stochastic Optimization Framework for Large-Scale Non-Convex Stochastic Problems

In this paper, we consider the problem of stochastic optimization, where the objective function is in terms of the expectation of a (possibly non-convex) cost function that is parametrized by a random variable. While the convergence speed…

Information Theory · Computer Science 2019-10-23 Naeimeh Omidvar , An Liu , Vincent Lau , Danny H. K. Tsang , Mohammad Reza Pakravan

A Stochastic Large-scale Machine Learning Algorithm for Distributed Features and Observations

As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…

Machine Learning · Statistics 2019-12-10 Biyi Fang , Diego Klabjan

Efficient Stochastic Programming in Julia

We present StochasticPrograms.jl, a user-friendly and powerful open-source framework for stochastic programming written in the Julia language. The framework includes both modeling tools and structure-exploiting optimization algorithms.…

Optimization and Control · Mathematics 2022-09-07 Martin Biel , Mikael Johansson

PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing

The paper introduces PDSP-Bench, a novel benchmarking system designed for a systematic understanding of performance of parallel stream processing in a distributed environment. Such an understanding is essential for determining how Stream…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-16 Pratyush Agnihotri , Boris Koldehofe , Roman Heinrich , Carsten Binnig , Manisha Luthra

A Flocking-based Approach for Distributed Stochastic Optimization

In recent years, the paradigm of cloud computing has emerged as an architecture for computing that makes use of distributed (networked) computing resources. In this paper, we consider a distributed computing algorithmic scheme for…

Optimization and Control · Mathematics 2017-09-22 Shi Pu , Alfredo Garcia

DeepSpark: A Spark-Based Distributed Deep Learning Framework for Commodity Clusters

The increasing complexity of deep neural networks (DNNs) has made it challenging to exploit existing large-scale data processing pipelines for handling massive data and parameters involved in DNN training. Distributed computing platforms…

Machine Learning · Computer Science 2016-10-04 Hanjoo Kim , Jaehong Park , Jaehee Jang , Sungroh Yoon

Parallelization of Machine Learning Algorithms Respectively on Single Machine and Spark

With the rapid development of big data technologies, how to dig out useful information from massive data becomes an essential problem. However, using machine learning algorithms to analyze large data may be time-consuming and inefficient on…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-14 Jiajun Shen

Probabilistic Synchronous Parallel

Most machine learning and deep neural network algorithms rely on certain iterative algorithms to optimise their utility/cost functions, e.g. Stochastic Gradient Descent. In distributed learning, the networked nodes have to work…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-06 Liang Wang , Ben Catterall , Richard Mortier

Modeling and Simulation of Spark Streaming

As more and more devices connect to Internet of Things, unbounded streams of data will be generated, which have to be processed "on the fly" in order to trigger automated actions and deliver real-time services. Spark Streaming is a popular…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-12 Jia-Chun Lin , Ming-Chang Lee , Ingrid Chieh Yu , Einar Broch Johnsen

A Discussion on Parallelization Schemes for Stochastic Vector Quantization Algorithms

This paper studies parallelization schemes for stochastic Vector Quantization algorithms in order to obtain time speed-ups using distributed resources. We show that the most intuitive parallelization scheme does not lead to better…

Machine Learning · Statistics 2012-05-14 Matthieu Durut , Benoît Patra , Fabrice Rossi

Automatic Parallelization of Sequential Programs

Prior work on Automatically Scalable Computation (ASC) suggests that it is possible to parallelize sequential computation by building a model of whole-program execution, using that model to predict future computations, and then…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-21 Peter Kraft , Amos Waterland , Daniel Y Fu , Anitha Gollamudi , Shai Szulanski , Margo Seltzer

Parallel optimized sampling for stochastic equations

Stochastic equations play an important role in computational science, due to their ability to treat a wide variety of complex statistical problems. However, current algorithms are strongly limited by their sampling variance, which scales…

Numerical Analysis · Mathematics 2017-01-04 Bogdan Opanchuk , Simon Kiesewetter , Peter D. Drummond

Ripple: A Practical Declarative Programming Framework for Serverless Compute

Serverless computing has emerged as a promising alternative to infrastructure- (IaaS) and platform-as-a-service (PaaS)cloud platforms for applications with ample parallelism and intermittent activity. Serverless promises greater resource…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-03 Shannon Joyner , Michael MacCoss , Christina Delimitrou , Hakim Weatherspoon

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

Parallelizing Optimal Multiple Sequence Alignment by Dynamic Programming

Optimal multiple sequence alignment by dynamic programming, like many highly dimensional scientific computing problems, has failed to benefit from the improvements in computing performance brought about by multi-processor systems, due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-30 Manal Helal , Hossam El-Gindy , Lenore Mullin , Bruno Gaeta