English
Related papers

Related papers: Efficient Task Replication for Fast Response Times…

200 papers

In a cloud computing job with many parallel tasks, the tasks on the slowest machines (straggling tasks) become the bottleneck in the job completion. Computing frameworks such as MapReduce and Spark tackle this by replicating the straggling…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-14 Da Wang , Gauri Joshi , Gregory Wornell

As numerous machine learning and other algorithms increase in complexity and data requirements, distributed computing becomes necessary to satisfy the growing computational and storage demands, because it enables parallel execution of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-21 Pei Peng , Emina Soljanin , Philip Whiting

Increasing need for large-scale data analytics in a number of application domains has led to a dramatic rise in the number of distributed data management systems, both parallel relational databases, and systems that support alternative…

Databases · Computer Science 2013-02-19 K. Ashwin Kumar , Amol Deshpande , Samir Khuller

Master-worker distributed computing systems use task replication in order to mitigate the effect of slow workers, known as stragglers. Tasks are grouped into batches and assigned to one or more workers for execution. We first consider the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-29 Amir Behrouzi-Far , Emina Soljanin

In modern computer systems, jobs are divided into short tasks and executed in parallel. Empirical observations in practical systems suggest that the task service times are highly random and the job service time is bottlenecked by the…

Performance · Computer Science 2017-02-08 Yin Sun , C. Emre Koksal , Ness B. Shroff

In cloud computing systems, assigning a job to multiple servers and waiting for the earliest copy to finish is an effective method to combat the variability in response time of individual servers. Although adding redundant replicas always…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-21 Gauri Joshi , Emina Soljanin , Gregory Wornell

In distributed computing systems with stragglers, various forms of redundancy can improve the average delay performance. We study the optimal replication of data in systems where the job execution time is a stochastically decreasing and…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-01 Amir Behrouzi-Far , Emina Soljanin

Task replication has recently been advocated as a practical solution to reduce latencies in parallel systems. In addition to several convincing empirical studies, some others provide analytical results, yet under some strong assumptions…

Performance · Computer Science 2016-02-26 Felix Poloczek , Florin Ciucu

Parallel real-time embedded applications can be modelled as directed acyclic graphs (DAGs) whose nodes model subtasks and whose edges model precedence constraints among subtasks. Efficiently scheduling such parallel tasks can be challenging…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-24 Shardul Lendve , Konstantinos Bletsas , Pedro F. Souto

Executing workflows on volunteer computing resources where individual tasks may be forced to relinquish their resource for the resource's primary use leads to unpredictability and often significantly increases execution time. Task…

Performance · Computer Science 2022-09-28 Andrew Stephen McGough , Matthew Forshaw

State machine replication is standard approach to fault tolerance. One of the key assumptions of state machine replication is that replicas must execute operations deterministically and thus serially. To benefit from multi-core servers,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-15 Eduardo Alchieri , Fernando Dotti , Fernando Pedone

The efficient parallel execution of complex computations requires balancing the workload across processors while minimizing the communication between them. This inherent trade-off is often captured by graph partitioning or DAG scheduling…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-04 Pál András Papp , Toni Böhnlein , A. N. Yzelman

Asynchronous methods are fundamental for parallelizing computations in distributed machine learning. They aim to accelerate training by fully utilizing all available resources. However, their greedy approach can lead to inefficiencies using…

Machine Learning · Computer Science 2025-05-23 Artavazd Maranjyan , El Mehdi Saad , Peter Richtárik , Francesco Orabona

Querying graph data with low latency is an important requirement in application domains such as social networks and knowledge graphs. Graph queries perform multiple hops between vertices. When data is partitioned and stored across multiple…

Databases · Computer Science 2022-12-21 Nathan Ng , Hung Le , Marco Serafini

We consider a parallel system of $m$ identical machines prone to unpredictable crashes and restarts, trying to cope with the continuous arrival of tasks to be executed. Tasks have different computational requirements (i.e., processing time…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-03-21 Elli Zavou , Antonio Fernández Anta

We consider the problem of stragglers in distributed computing systems. Stragglers, which are compute nodes that unpredictably slow down, often increase the completion times of tasks. One common approach to mitigating stragglers is work…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-07 Tharindu Adikari , Haider Al-Lawati , Jason Lam , Zhenhua Hu , Stark C. Draper

The main goal of parallel processing is to provide users with performance that is much better than that of single processor systems. The execution of jobs is scheduled, which requires certain resources in order to meet certain criteria.…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-07 Yang Cao , Fei Wu , Thomas Robertazzi

Motivated by modern parallel computing applications, we consider the problem of scheduling parallel-task jobs with heterogeneous resource requirements in a cluster of machines. Each job consists of a set of tasks that can be processed in…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-03 Mehrnoosh Shafiee , Javad Ghaderi

The constant increase in parallelism available on large-scale distributed computers poses major scalability challenges to many scientific applications. A common strategy to improve scalability is to express the algorithm in terms of…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-23 Andrew Garmon , Vinay Ramakrishnaiah , Danny Perez

Grid Computing is a type of parallel and distributed systems that is designed to provide reliable access to data and computational resources in wide area networks. These resources are distributed in different geographical locations, however…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-09-27 Sheida Dayyani , Mohammad Reza Khayyambashi
‹ Prev 1 2 3 10 Next ›