Related papers: Practical Size-based Scheduling for MapReduce Work…
Size-based schedulers have very desirable performance properties: optimal or near-optimal response time can be coupled with strong fairness guarantees. Despite this, such systems are very rarely implemented in practical settings, because…
In hadoop, the job scheduling is an independent module, users can design their own job scheduler based on their actual application requirements, thereby meet their specific business needs. Currently, hadoop has three schedulers: FIFO,…
Deep neural networks training jobs and other iterative computations frequently include checkpoints where jobs can be canceled based on the current value of monitored metrics. While most of existing results focus on the performance of all…
Despite the fact that size-based schedulers can give excellent results in terms of both average response times and fairness, data-intensive computing execution engines generally do not employ size-based schedulers, mainly because of the…
We study size-based schedulers, and focus on the impact of inaccurate job size information on response time and fairness. Our intent is to revisit previous results, which allude to performance degradation for even small errors on job size…
Cloud Computing is emerging as a new computational paradigm shift. Hadoop-MapReduce has become a powerful Computation Model for processing large data on distributed commodity hardware clusters such as Clouds. In all Hadoop implementations,…
We study the conditional sojourn time distributions of processor sharing (PS), foreground background processor sharing (FBPS) and shortest remaining processing time first (SRPT) scheduling disciplines on an event where the job size of a…
We consider a system of parallel queues where tasks are assigned (dispatched) to one of the available servers upon arrival. The dispatching decision is based on the full state information, i.e., on the sizes of the new and existing jobs. We…
It is well known that size-based scheduling policies, which take into account job size (i.e., the time it takes to run them), can perform very desirably in terms of both response time and fairness. Unfortunately, the requirement of knowing…
By executing jobs serially rather than in parallel, size-based scheduling policies can shorten time needed to complete jobs; however, major obstacles to their applicability are fairness guarantees and the fact that job sizes are rarely…
Software Defined Networking (SDN) is a revolutionary network architecture that separates out network control functions from the underlying equipment and is an increasingly trend to help enterprises build more manageable data centers where…
Apache introduced YARN as the next generation of the Hadoop framework, providing resource management and a central platform to deliver consistent data governance tools across Hadoop clusters. Hadoop YARN supports multiple frameworks like…
Hadoop is an open source implementation of the MapReduce Framework in the realm of distributed processing. A Hadoop cluster is a unique type of computational cluster designed for storing and analyzing large data sets across cluster of…
MapReduce has become a popular programming model for running data intensive applications on the cloud. Completion time goals or deadlines of MapReduce jobs set by users are becoming crucial in existing cloud-based data processing…
Job schedulers are a key component of scalable computing infrastructures. They orchestrate all of the work executed on the computing infrastructure and directly impact the effectiveness of the system. Recently, job workloads have…
To solve the limitation of Hadoop on scalability, resource sharing, and application support, the open-source community proposes the next generation of Hadoop's compute platform called Yet Another Resource Negotiator (YARN) by separating…
Within the project management context, project scheduling serves as an indispensable component, functioning as a fundamental tool for planning, monitoring, controlling, and managing projects more broadly. Although the resource-constrained…
The Hadoop scheduler is a centerpiece of Hadoop, the leading processing framework for data-intensive applications in the cloud. Given the impact of failures on the performance of applications running on Hadoop, testing and verifying the…
It is cost-efficient for a tenant with a limited budget to establish a virtual MapReduce cluster by renting multiple virtual private servers (VPSs) from a VPS provider. To provide an appropriate scheduling scheme for this type of computing…
Issues of inequity in U.S. high schools' course scheduling did not previously exist. However, in recent years, with the increase in student population and course variety, students perceive that the course scheduling method is unfair.…