Related papers: Hemingway: Modeling Distributed Optimization Algor…

Machine Learning and CPU (Central Processing Unit) Scheduling Co-Optimization over a Network of Computing Centers

In the rapidly evolving research on artificial intelligence (AI) the demand for fast, computationally efficient, and scalable solutions has increased in recent years. The problem of optimizing the computing resources for distributed machine…

Machine Learning · Computer Science 2025-10-30 Mohammadreza Doostmohammadian , Zulfiya R. Gabidullina , Hamid R. Rabiee

Toward Efficient Online Scheduling for Distributed Machine Learning Systems

Recent years have witnessed a rapid growth of distributed machine learning (ML) frameworks, which exploit the massive parallelism of computing clusters to expedite ML training. However, the proliferation of distributed ML frameworks also…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-16 Menglu Yu , Jia Liu , Chuan Wu , Bo Ji , Elizabeth S. Bentley

Horizontally Scalable Submodular Maximization

A variety of large-scale machine learning problems can be cast as instances of constrained submodular maximization. Existing approaches for distributed submodular maximization have a critical drawback: The capacity - number of instances…

Machine Learning · Statistics 2016-06-01 Mario Lucic , Olivier Bachem , Morteza Zadimoghaddam , Andreas Krause

Strategies and Principles of Distributed Machine Learning on Big Data

The rise of Big Data has led to new demands for Machine Learning (ML) systems to learn complex models with millions to billions of parameters, that promise adequate capacity to digest massive datasets and offer powerful predictive analytics…

Machine Learning · Statistics 2016-01-01 Eric P. Xing , Qirong Ho , Pengtao Xie , Wei Dai

Optimization and Learning with Information Streams: Time-varying Algorithms and Applications

There is a growing cross-disciplinary effort in the broad domain of optimization and learning with streams of data, applied to settings where traditional batch optimization techniques cannot produce solutions at time scales that match the…

Optimization and Control · Mathematics 2021-11-29 Emiliano Dall'Anese , Andrea Simonetto , Stephen Becker , Liam Madden

The Power of Randomization: Distributed Submodular Maximization on Massive Datasets

A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization…

Machine Learning · Computer Science 2015-04-23 Rafael da Ponte Barbosa , Alina Ene , Huy L. Nguyen , Justin Ward

On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice

Machine learning algorithms have been used widely in various applications and areas. To fit a machine learning model into different problems, its hyper-parameters must be tuned. Selecting the best hyper-parameter configuration for machine…

Machine Learning · Computer Science 2022-10-06 Li Yang , Abdallah Shami

Optimization for Large-Scale Machine Learning with Distributed Features and Observations

As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…

Machine Learning · Statistics 2017-04-18 Alexandros Nathan , Diego Klabjan

Online Distributed Scheduling on a Fault-prone Parallel System

We consider a parallel system of $m$ identical machines prone to unpredictable crashes and restarts, trying to cope with the continuous arrival of tasks to be executed. Tasks have different computational requirements (i.e., processing time…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-03-21 Elli Zavou , Antonio Fernández Anta

A Survey From Distributed Machine Learning to Distributed Deep Learning

Artificial intelligence has made remarkable progress in handling complex tasks, thanks to advances in hardware acceleration and machine learning algorithms. However, to acquire more accurate outcomes and solve more complex issues,…

Machine Learning · Computer Science 2023-09-12 Mohammad Dehghani , Zahra Yazdanparast

A Survey on Fault-tolerance in Distributed Optimization and Machine Learning

The robustness of distributed optimization is an emerging field of study, motivated by various applications of distributed optimization including distributed machine learning, distributed sensing, and swarm robotics. With the rapid…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-29 Shuo Liu

Communication-Optimal Distributed Clustering

Clustering large datasets is a fundamental problem with a number of applications in machine learning. Data is often collected on different sites and clustering needs to be performed in a distributed manner with low communication. We would…

Data Structures and Algorithms · Computer Science 2017-02-02 Jiecao Chen , He Sun , David P. Woodruff , Qin Zhang

Scaling-up Distributed Processing of Data Streams for Machine Learning

Emerging applications of machine learning in numerous areas involve continuous gathering of and learning from streams of data. Real-time incorporation of streaming data into the learned models is essential for improved inference in these…

Machine Learning · Computer Science 2020-12-01 Matthew Nokleby , Haroon Raja , Waheed U. Bajwa

Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence

Distributed optimization is the standard way of speeding up machine learning training, and most of the research in the area focuses on distributed first-order, gradient-based methods. Yet, there are settings where some…

Machine Learning · Computer Science 2025-11-03 Matin Ansaripour , Shayan Talaei , Giorgi Nadiradze , Dan Alistarh

Distributed Hybrid Parallelism for Large Language Models: Comparative Study and System Design Guide

With the rapid growth of large language models (LLMs), a wide range of methods have been developed to distribute computation and memory across hardware devices for efficient training and inference. While existing surveys provide descriptive…

Machine Learning · Computer Science 2026-02-11 Hossam Amer , Rezaul Karim , Ali Pourranjbar , Weiwei Zhang , Walid Ahmed , Boxing Chen

A Comparative Analysis of Distributed Linear Solvers under Data Heterogeneity

We consider the problem of solving a large-scale system of linear equations in a distributed or federated manner by a taskmaster and a set of machines, each possessing a subset of the equations. We provide a comprehensive comparison of two…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-24 Boris Velasevic , Rohit Parasnis , Christopher G. Brinton , Navid Azizan

Accelerating data-driven algorithm selection for combinatorial partitioning problems

Data-driven algorithm selection is a powerful approach for choosing effective heuristics for computational problems. It operates by evaluating a set of candidate algorithms on a collection of representative training instances and selecting…

Machine Learning · Computer Science 2025-12-04 Vaggos Chatziafratis , Ishani Karmarkar , Yingxi Li , Ellen Vitercik

Revisiting Large Scale Distributed Machine Learning

Nowadays, with the widespread of smartphones and other portable gadgets equipped with a variety of sensors, data is ubiquitous available and the focus of machine learning has shifted from being able to infer from small training samples to…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-07-07 Radu Cristian Ionescu

Analysis of Distributed Algorithms for Big-data

The parallel and distributed processing are becoming de facto industry standard, and a large part of the current research is targeted on how to make computing scalable and distributed, dynamically, without allocating the resources on…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-10 Rajendra Purohit , K R Chowdhary , S D Purohit

Towards Collaborative Optimization of Cluster Configurations for Distributed Dataflow Jobs

Analyzing large datasets with distributed dataflow systems requires the use of clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. However, picking the appropriate resources…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-28 Jonathan Will , Jonathan Bader , Lauritz Thamsen