Related papers: Integrating Abstractions to Enhance the Execution …
Resource selection and task placement for distributed execution poses conceptual and implementation difficulties. Although resource selection and task placement are at the core of many tools and workflow systems, the methods are ad hoc…
A typical enterprise uses a local area network of computers to perform its business. During the off-working hours, the computational capacities of these networked computers are underused or unused. In order to utilize this computational…
Developing software for scientific applications that require the integration of diverse types of computing, instruments, and data present challenges that are distinct from commercial software. These applications require scale, and the need…
Programmability, performance portability, and resource efficiency have emerged as critical challenges in harnessing complex and diverse architectures today to obtain high performance and energy efficiency. While there is abundant research,…
In the past decade, increasingly network scheduling techniques have been proposed to boost the distributed application performance. Flow-level metrics, such as flow completion time (FCT), are based on the abstraction of flows yet they…
Many science and industry IoT applications necessitate data processing across the edge-to-cloud continuum to meet performance, security, cost, and privacy requirements. However, diverse abstractions and infrastructures for managing…
This paper examines disaggregated data center architectures from the perspective of the applications that would run on these data centers, and challenges the abstractions that have been proposed to date. In particular, we argue that…
The evolution of distributed architectures and programming paradigms for performance-oriented program development, challenge the state-of-the-art technology for performance tools. The area of high performance computing is rapidly expanding…
Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making. These applications pose a…
We propose a middleware framework for deployment and subsequent autonomic management of component-based distributed applications. An initial deployment goal is specified using a declarative constraint language, expressing constraints over…
Scientific problems that depend on processing large amounts of data require overcoming challenges in multiple areas: managing large-scale data distribution, controlling co-placement and scheduling of data with compute resources, and…
Distributed applications, such as database queries and distributed training, consist of both compute and network tasks. DAG-based abstraction primarily targets compute tasks and has no explicit network-level scheduling. In contrast, Coflow…
The demand for distributed applications has significantly increased over the past decade, with improvements in machine learning techniques fueling this growth. These applications predominantly utilize Cloud data centers for high-performance…
Diagnosing problems in deployed distributed applications continues to grow more challenging. A significant reason is the extreme mismatch between the powerful abstractions developers have available to build increasingly complex distributed…
Cloud computing has grown to become a popular distributed computing service offered by commercial providers. More recently, Edge and Fog computing resources have emerged on the wide-area network as part of Internet of Things (IoT)…
Many organizations routinely analyze large datasets using systems for distributed data-parallel processing and clusters of commodity resources. Yet, users need to configure adequate resources for their data processing jobs. This requires…
Edge computing is an emerging paradigm to enable low-latency applications, like mobile augmented reality, because it takes the computation on processing devices that are closer to the users. On the other hand, the need for highly scalable…
Performance modeling can help to improve the resource efficiency of clusters and distributed dataflow applications, yet the available modeling data is often limited. Collaborative approaches to performance modeling, characterized by the…
This paper presents the overall design of a multi-agent framework for tuning the performance of an application executing in a distributed environment. The multi-agent framework provides services like resource brokering, analyzing…
Mixed-criticality real-time scheduling has been developed to improve resource utilization while guaranteeing safe execution of critical applications. These studies use optimistic resource reservation for all the applications to improve…