Related papers: The MIT Supercloud Workload Classification Challen…
Artificial intelligence (AI) and Machine learning (ML) workloads are an increasingly larger share of the compute workloads in traditional High-Performance Computing (HPC) centers and commercial cloud systems. This has led to changes in…
Workloads in modern cloud data centers are becoming increasingly complex. The number of workloads running in cloud data centers has been growing exponentially for the last few years, and cloud service providers (CSP) have been supporting…
The increasing use and cost of high performance computing (HPC) requires new easy-to-use tools to enable HPC users and HPC systems engineers to transparently understand the utilization of resources. The MIT Lincoln Laboratory Supercomputing…
Increasingly, scientific discovery requires sophisticated and scalable workflows. Workflows have become the ``new applications,'' wherein multi-scale computing campaigns comprise multiple and heterogeneous executable tasks. In particular,…
The convergence of IoT, Edge, Cloud, and HPC technologies creates a compute continuum that merges cloud scalability and flexibility with HPC's computational power and specialized optimizations. However, integrating cloud and HPC resources…
The emergence of large-scale AI models, like GPT-4, has significantly impacted academia and industry, driving the demand for high-performance computing (HPC) to accelerate workloads. To address this, we present HPCClusterScape, a…
The evolution of High-Performance Computing (HPC) platforms enables the design and execution of progressively larger and more complex workflow applications in these systems. The complexity comes not only from the number of elements that…
Workload characterization is an integral part of performance analysis of high performance computing (HPC) systems. An understanding of workload properties sheds light on resource utilization and can be used to inform performance…
For the first time, this paper systematically identifies three categories of throughput oriented workloads in data centers: services, data processing applications, and interactive real-time applications, whose targets are to increase the…
The MIT SuperCloud Portal Workspace enables the secure exposure of web services running on high performance computing (HPC) systems. The portal allows users to run any web application as an HPC job and access it from their workstation while…
The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety of the latest scientific databases, including Apache Accumulo and SciDB. It is designed to permit these databases to run on a High…
The workload prediction and resource allocation significantly play an inevitable role in production of an efficient cloud environment. The proactive estimation of future workload followed by decision of resource allocation have become a…
Cloud workloads today are typically managed in a distributed environment and processed across geographically distributed data centers. Cloud service providers have been distributing data centers globally to reduce operating costs while also…
As the amount of data explodes rapidly, more and more corporations are using data centers to make effective decisions and gain a competitive edge. Data analysis applications play a significant role in data centers, and hence it has became…
Computer-based scientific experiments are becoming increasingly data-intensive, necessitating the use of High-Performance Computing (HPC) clusters to handle large scientific workflows. These workflows result in complex data and control…
Significant investments to upgrade and construct large-scale scientific facilities demand commensurate investments in R&D to design algorithms and computing approaches to enable scientific and engineering breakthroughs in the big data era.…
The increasing energy demands and carbon footprint of large-scale AI require intelligent workload management in globally distributed data centers. Yet progress is limited by the absence of benchmarks that realistically capture the interplay…
The increasing demand for artificial intelligence (AI) workloads across diverse computing environments has driven the need for more efficient data management strategies. Traditional cloud-based architectures struggle to handle the sheer…
One of the more complex tasks for researchers using HPC systems is performance monitoring and tuning of their applications. Developing a practice of continuous performance improvement, both for speed-up and efficient use of resources is…
High Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost-benefit of…