Related papers: Predicting Intermediate Storage Performance for Wo…
Accurate prediction of application performance is critical for enabling effective scheduling and resource management in resource-constrained dynamic edge environments. However, achieving predictable performance in such environments remains…
Scientific workflow management systems support large-scale data analysis on cluster infrastructures. For this, they interact with resource managers which schedule workflow tasks onto cluster nodes. In addition to workflow task descriptions,…
In modern computing environments, users may have multiple systems accessible to them such as local clusters, private clouds, or public clouds. This abundance of choices makes it difficult for users to select the system and configuration for…
With the rapid development of cloud computing and big data technologies, storage systems have become a fundamental building block of datacenters, incorporating hardware innovations such as flash solid state drives and non-volatile memories,…
Job submissions of parallel applications to production supercomputer systems will have to be carefully tuned in terms of the job submission parameters to obtain minimum response times. In this work, we have developed an end-to-end resource…
Modern machine learning training is increasingly bottlenecked by data I/O rather than compute. GPUs often sit idle at below 50% utilization waiting for data. This paper presents a machine learning approach to predict I/O performance and…
Infrastructure as a service clouds hide the complexity of maintaining the physical infrastructure with a slight disadvantage: they also hide their internal working details. Should users need knowledge about these details e.g., to increase…
Prediction serving systems are designed to provide large volumes of low-latency inferences machine learning models. These systems mix data processing and computationally intensive model inference and benefit from multiple heterogeneous…
Complex scientific workflows can process large amounts of data using thousands of tasks. The turnaround times of these workflows are often affected by various latencies such as the resource discovery, scheduling and data access latencies…
Workflows provide an expressive programming model for fine-grained control of large-scale applications in distributed computing environments. Accurate estimates of complex workflow execution metrics on large-scale machines have several key…
Many robotic applications that are critical for robot performance require immediate feedback, hence execution time is a critical concern. Furthermore, it is common that robots come with a fixed quantity of hardware resources; if an…
The precise estimation of resource usage is a complex and challenging issue due to the high variability and dimensionality of heterogeneous service types and dynamic workloads. Over the last few years, the prediction of resource usage and…
In the use of database systems, the design of the storage engine and data model directly affects the performance of the database when performing queries. Therefore, the users of the database need to select the storage engine and design data…
The storage stack in the traditional operating system is primarily optimized towards improving the CPU utilization and hiding the long I/O latency imposed by the slow I/O devices such as hard disk drivers (HDDs). However, the emerging…
Finding good configurations for a software system is often challenging since the number of configuration options can be large. Software engineers often make poor choices about configuration or, even worse, they usually use a sub-optimal…
This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a methodology to quickly predict tactical solutions to a given operational problem. In this context, the…
This paper presents how an existing framework for offline performance optimization can be applied to microservice applications during the Release phase of the DevOps life cycle. Optimization of resource allocation configuration parameters…
Nowadays, data-centers are largely under-utilized because resource allocation is based on reservation mechanisms which ignore actual resource utilization. Indeed, it is common to reserve resources for peak demand, which may occur only for a…
A growing number of critical workflow applications leverage a streamlined edge-hub-cloud architecture, which diverges from the conventional edge computing paradigm. An edge device, in collaboration with a hub device and a cloud server,…
The utilization of cloud environments to deploy scientific workflow applications is an emerging trend in scientific community. In this area, the main issue is the scheduling of workflows, which is known as an NP-complete problem. Apart from…