Related papers: CGSim: A Simulation Framework for Large Scale Dist…
The Worldwide LHC Computing Grid (WLCG) provides the robust computing infrastructure essential for the LHC experiments by integrating global computing resources into a cohesive entity. Simulations of different compute models present a…
Cloud computing focuses on delivery of reliable, secure, fault-tolerant, sustainable, and scalable infrastructures for hosting Internet-based application services. These applications have different composition, configuration, and deployment…
Cloud Computing has established itself as an efficient and cost-effective paradigm for the execution of web-based applications, and scientific workloads, that need elasticity and on-demand scalability capabilities. However, the evaluation…
Predicting the performance of various infrastructure design options in complex federated infrastructures with computing sites distributed over a wide area network that support a plethora of users and workflows, such as the Worldwide LHC…
Cloud computing aims to power the next generation data centers and enables application service providers to lease data center capabilities for deploying applications depending on user QoS (Quality of Service) requirements. Cloud…
Cloud computing based systems, that span data centers, are commonly deployed to offer high performance for user service requests. As data centers continue to expand, computer architects and system designers are facing many challenges on how…
The increasing prevalence of cloud-native technologies, particularly containers, has led to the widespread adoption of containerized deployments in data centers. The advancement of deep neural network models has increased the demand for…
Simulation has become the evaluation method of choice for many areas of distributing computing research. However, most existing simulation packages have several limitations on the size and complexity of the system being modeled. Fine…
Serverless computing is gaining traction as an attractive model for the deployment of a multitude of workloads in the cloud. Designing and building effective resource management solutions for any computing environment requires extensive…
When physical testbeds are out of reach for evaluating a networked system, we frequently turn to simulation. In today's datacenter networks, bottlenecks are rarely at the network protocol level, but instead in end-host software or hardware…
CPU is undoubtedly the most important resource of the computer system. Recent advances in software and system architecture have increased processing complexity, as computing is now distributed and parallel. CloudSim represents the…
Cloud computing environment simulators enable cost-effective experimentation of novel infrastructure designs and management approaches by avoiding significant costs incurred from repetitive deployments in real Cloud platforms. However,…
Emerging paradigms of big data and Software-Defined Networking (SDN) in cloud data centers have gained significant attention from industry and academia. The integration and coordination of big data and SDN are required to improve the…
Clusters, grids, and peer-to-peer (P2P) networks have emerged as popular paradigms for next generation parallel and distributed computing. The management of resources and scheduling of applications in such large-scale distributed systems is…
Cloud Computing (CC) is a model for enabling on-demand access to a shared pool of configurable computing resources. Testing and evaluating the performance of the cloud environment for allocating, provisioning, scheduling, and data…
Large-scale AI training and inference require hundreds of gigabytes to terabytes of DRAM with high peak to average utilization ratios, resulting in overprovisioning. In cloud computing, DRAM constitutes a significant share of the cost. Yet,…
The growing demand for large-scale quantum computers is pushing research on Distributed Quantum Computing (DQC). Recent experimental efforts have demonstrated some of the building blocks for such a design. DQC systems are clusters of…
Architectural simulation has become the critical bottleneck limiting design space exploration for high-performance computing systems. Modern GPUs and AI accelerators -- with hundreds to thousands of tightly-coupled components -- demand…
Large language model (LLM) serving infrastructures are undergoing a shift toward heterogeneity and disaggregation. Modern deployments increasingly integrate diverse accelerators and near-memory processing technologies, introducing…
Recently, there has been an extensive research effort in building efficient large language model (LLM) inference serving systems. These efforts not only include innovations in the algorithm and software domains but also constitute…