English
Related papers

Related papers: Early Scheduling in Parallel State Machine Replica…

200 papers

State-machine replication, a fundamental approach to fault tolerance, requires replicas to execute commands deterministically, which usually results in sequential execution of commands. Sequential execution limits performance and underuses…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-04-29 Parisa Jalili Marandi , Fernando Pedone

State-machine replication, a fundamental approach to designing fault-tolerant services, requires commands to be executed in the same order by all replicas. Moreover, command execution must be deterministic: each replica must produce the…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-26 Parisa Jalili Marandi , Carlos Eduardo Bezerra , Fernando Pedone

State Machine Replication (SMR) is a fundamental approach to designing service with fault tolerance. However, its requirement for the deterministic execution of transactions often results in single-threaded replicas, which cannot fully…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-27 Gang Wu1 , Guodong Zhao , Yidong Song

One typical use case of large-scale distributed computing in data centers is to decompose a computation job into many independent tasks and run them in parallel on different machines, sometimes known as the "embarrassingly parallel"…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-04-07 Da Wang , Gauri Joshi , Gregory Wornell

Linearizability is a well-known correctness property for concurrent and distributed systems. In the past, it was also used to prove the design and implementation of replicated state-machines correct. State-machine replication (SMR) is a…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-03 Franz J. Hauck , Alexander Heß

In this paper we study the scheduling of parallel and real-time recurrent tasks. Firstly, we propose a new parallel task model which allows recurrent tasks to be composed of several threads, each thread requires a single processor for…

Operating Systems · Computer Science 2015-03-19 Irina Iulia Lupu , Joël Goossens

Serial-parallel redundancy is a reliable way to ensure service and systems will be available in cloud computing. That method involves making copies of the same system or program, with only one remaining active. When an error occurs, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-08 Gutha Jaya Krishna

In modern computer systems, jobs are divided into short tasks and executed in parallel. Empirical observations in practical systems suggest that the task service times are highly random and the job service time is bottlenecked by the…

Performance · Computer Science 2017-02-08 Yin Sun , C. Emre Koksal , Ness B. Shroff

Distributed systems, such as state machine replication, are critical infrastructures for modern applications. Practical distributed protocols make minimum assumptions about the underlying network: They typically assume a partially…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-18 Yiliang Wan , Nitin Shivaraman , Akshaye Shenoi , Xiang Liu , Tao Luo , Jialin Li

Parallel batched data structures are designed to process synchronized batches of operations in a parallel computing model. In this paper, we propose parallel combining, a technique that implements a concurrent data structure from a parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-14 Vitaly Aksenov , Petr Kuznetsov , Anatoly Shalyto

The main goal of parallel processing is to provide users with performance that is much better than that of single processor systems. The execution of jobs is scheduled, which requires certain resources in order to meet certain criteria.…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-07 Yang Cao , Fei Wu , Thomas Robertazzi

This paper presents different methods for solving parallel machine scheduling problems with precedence constraints and setup times between the jobs. Limited discrepancy search methods mixed with local search principles, dominance conditions…

Data Structures and Algorithms · Computer Science 2009-02-19 Bernat Gacias , Christian Artigues , Pierre Lopez

Real-life parallel machine scheduling problems can be characterized by: (i) limited information about the exact task duration at scheduling time, and (ii) an opportunity to reschedule the remaining tasks each time a task processing is…

Optimization and Control · Mathematics 2023-11-22 Izack Cohen , Krzysztof Postek , Shimrit Shtern

Federated scheduling is a promising approach to schedule parallel real-time tasks on multi-cores, where each heavy task exclusively executes on a number of dedicated processors, while light tasks are treated as sequential sporadic tasks and…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-10 Xu Jiang , Nan Guan , Xiang Long , Wang Yi

Developing an efficient server-based real-time scheduling solution that supports dynamic task-level parallelism is now relevant to even the desktop and embedded domains and no longer only to the high performance computing market niche. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-06-15 Luís Nogueira , Luís Miguel Pinho

Scheduling query execution plans is a particularly complex problem in shared-nothing parallel systems, where each site consists of a collection of local time-shared (e.g., CPU(s) or disk(s)) and space-shared (e.g., memory) resources and…

Databases · Computer Science 2014-04-01 Minos Garofalakis , Yannis Ioannidis

Motivated by modern parallel computing applications, we consider the problem of scheduling parallel-task jobs with heterogeneous resource requirements in a cluster of machines. Each job consists of a set of tasks that can be processed in…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-03 Mehrnoosh Shafiee , Javad Ghaderi

Developing state-machine replication protocols for practical use is a complex and labor-intensive process because of the myriad of essential tasks (e.g., deployment, communication, recovery) that need to be taken into account in an…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-25 Laura Lawniczak , Tobias Distler

Recently, the problem of multitasking scheduling has attracted a lot of attention in the service industries where workers frequently perform multiple tasks by switching from one task to another. Hall, Leung and Li (Discrete Applied…

Data Structures and Algorithms · Computer Science 2022-04-06 Bin Fu , Yumei Huo , Hairong Zhao

In large-scale LLM pre-training systems with 100k+ GPUs, failures become the norm rather than the exception, and restart costs can dominate wall-clock training time. However, existing fault-tolerance mechanisms are largely unprepared for…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Jin Lee , Zhonghao Chen , Xuhang He , Robert Underwood , Bogdan Nicolae , Franck Cappello , Xiaoyi Lu , Sheng Di , Zheng Zhang
‹ Prev 1 2 3 10 Next ›