Related papers: Application-aware Congestion Mitigation for High-P…

A Study of Network Congestion in Two Supercomputing High-Speed Interconnects

Network congestion in high-speed interconnects is a major source of application run time performance variation. Recent years have witnessed a surge of interest from both academia and industry in the development of novel approaches for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-12 Saurabh Jha , Archit Patke , Jim Brandt , Ann Gentile , Mike Showerman , Eric Roman , Zbigniew T. Kalbarczyk , William T. Kramer , Ravishankar K. Iyer

Mitigating Shared Storage Congestion Using Control Theory

Efficient data access in High-Performance Computing (HPC) systems is essential to the performance of intensive computing tasks. Traditional optimizations of the I/O stack aim to improve peak performance but are often workload specific and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-21 Thomas Collignon , Kouds Halitim , Raphaël Bleuse , Sophie Cerf , Bogdan Robu , Éric Rutten , Lionel Seinturier , Alexandre van Kempen

Characterizing the Impact of Congestion in Modern HPC Interconnects

High-performance computing (HPC) systems increasingly support both scalable AI training and large-scale simulation workloads. Both typically rely heavily on collective communication operations. On modern supercomputers, however, network…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-14 Lorenzo Piarulli , Marco Faltelli , Dirk Pleiter , Karthee Sivalingam , Dancheng Zhang , Kexue Zhao , Matteo Turisini , Francesco Iannone , Aldo Artigiani , Daniele De Sensi

Mitigating Network Noise on Dragonfly Networks through Application-Aware Routing

System noise can negatively impact the performance of HPC systems, and the interconnection network is one of the main factors contributing to this problem. To mitigate this effect, adaptive routing sends packets on non-minimal paths if they…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-18 Daniele De Sensi , Salvatore Di Girolamo , Torsten Hoefler

Mage: Online Interference-Aware Scheduling in Multi-Scale Heterogeneous Systems

Heterogeneity has grown in popularity both at the core and server level as a way to improve both performance and energy efficiency. However, despite these benefits, scheduling applications in heterogeneous machines remains challenging.…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-19 Francisco Romero , Christina Delimitrou

Comparative Study Of Congestion Control Techniques In High Speed Networks

Congestion in network occurs due to exceed in aggregate demand as compared to the accessible capacity of the resources. Network congestion will increase as network speed increases and new effective congestion control methods are needed,…

Networking and Internet Architecture · Computer Science 2009-12-08 Shakeel Ahmad , Adli Mustafa , Bashir Ahmad , Arjamand Bano , Al-Sammarraie Hosam

System Support for Bandwidth Management and Content Adaptation in Internet Applications

This paper describes the implementation and evaluation of an operating system module, the Congestion Manager (CM), which provides integrated network flow management and exports a convenient programming interface that allows applications to…

Networking and Internet Architecture · Computer Science 2007-05-23 David G. Andersen , Deepak Bansal , Dorothy Curtis , Srinivasan Seshan , Hari Balakrishnan

Congestion Management in High-Performance Interconnection Networks Using Adaptive Routing Notifications

The interconnection network is a crucial subsystem in High-Performance Computing clusters and Data-centers, guaranteeing high bandwidth and low latency to the applications' communication operations. Unfortunately, congestion situations may…

Networking and Internet Architecture · Computer Science 2025-02-04 Jose Rocher-Gonzalez , Jesus Escudero-Sahuquillo , Pedro J. Garcia , Francisco J. Quiles

Combined power management and congestion control in High-Speed Ethernet-based Networks for Supercomputers and Data Centers

The demand for computer in our daily lives has led to the proliferation of Datacenters that power indispensable many services. On the other hand, computing has become essential for some research for various scientific fields, that require…

Hardware Architecture · Computer Science 2025-11-14 Miguel Sánchez de la Rosa , Francisco J. andújar , Jesus Escudero-Sahuquillo , José L. Sánchez , Francisco J. Alfaro-Cortés

What Artificial Intelligence can do for High-Performance Computing systems?

High-performance computing (HPC) centers consume substantial power, incurring environmental and operational costs. This review assesses how artificial intelligence (AI), including machine learning (ML) and optimization, improves the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-03 Pierrick Pochelu , Hyacinthe Cartiaux , Julien Schleich

Congestion control in high-speed networks using the probabilistic estimation approach

Nowadays, the bulk of Internet traffic uses TCP protocol for reliable transmission. But the standard TCP's performance is very poor in High Speed Networks (HSN) and hence the core gigabytes links are usually underutilization. This problem…

Networking and Internet Architecture · Computer Science 2021-03-18 Shahram Jamali , Mir Mahmoud Talebi , Reza Fotohi

Mitigating Congestion in Complex Transportation Networks via Maximum Entropy

In this paper, we reveal the relationship between entropy rate and the congestion in complex network and solve it analytically for special cases. Finding maximizing entropy rate will lead to an improvement of traffic efficiency, we propose…

Physics and Society · Physics 2017-09-15 Yuhang Fan , Hanyuan Liu , Shibo He

HPCClusterScape: Increasing Transparency and Efficiency of Shared High-Performance Computing Clusters for Large-scale AI Models

The emergence of large-scale AI models, like GPT-4, has significantly impacted academia and industry, driving the demand for high-performance computing (HPC) to accelerate workloads. To address this, we present HPCClusterScape, a…

Human-Computer Interaction · Computer Science 2023-12-22 Heungseok Park , Aeree Cho , Hyojun Jeon , Hayoung Lee , Youngil Yang , Sungjae Lee , Heungsub Lee , Jaegul Choo

Enhanced Traffic Congestion Management with Fog Computing: A Simulation-based Investigation using iFog-Simulator

Accurate latency computation is essential for the Internet of Things (IoT) since the connected devices generate a vast amount of data that is processed on cloud infrastructure. However, the cloud is not an optimal solution. To overcome this…

Networking and Internet Architecture · Computer Science 2023-11-03 Alzahraa Elsayed , Khalil Mohamed , Hany Harb

Exploring the Landscape of Distributed Graph Sketching

Recent work has initiated the study of dense graph processing using graph sketching methods, which drastically reduce space costs by lossily compressing information about the input graph. In this paper, we explore the strange and surprising…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-18 David Tench , Evan T. West , Kenny Zhang , Michael Bender , Daniel DeLayo , Martin Farach-Colton , Gilvir Gill , Tyler Seip , Victor Zhang

A Self-adaptive Auto-scaling Method for Scientific Applications on HPC Environments and Clouds

High intensive computation applications can usually take days to months to finish an execution. During this time, it is common to have variations of the available resources when considering that such hardware is usually shared among a…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-01-27 Kiran Mantripragada , Alecio Binotto , Leonardo P. Tizzei

A Delay Based Approach for Congestion Avoidance in Interconnected Heterogeneous Computer Networks

In heterogeneous networks, achieving congestion avoidance is difficult because the congestion feedback from one subnetwork may have no meaning to source on other other subnetworks. We propose using changes in round-trip delay as an implicit…

Networking and Internet Architecture · Computer Science 2007-05-23 R. Jain

Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training

Training neural network often uses a machine learning framework such as TensorFlow and Caffe2. These frameworks employ a dataflow model where the NN training is modeled as a directed graph composed of a set of nodes. Operations in neural…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-20 Jiawen Liu , Dong Li , Gokcen Kestor , Jeffrey Vetter

PowerTCP: Pushing the Performance Limits of Datacenter Networks

Increasingly stringent throughput and latency requirements in datacenter networks demand fast and accurate congestion control. We observe that the reaction time and accuracy of existing datacenter congestion control schemes are inherently…

Networking and Internet Architecture · Computer Science 2021-12-30 Vamsi Addanki , Oliver Michel , Stefan Schmid

LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming

The shift towards high-bandwidth networks driven by AI workloads in data centers and HPC clusters has unintentionally aggravated network latency, adversely affecting the performance of communication-intensive HPC applications. As…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-23 Siyuan Shen , Langwen Huang , Marcin Chrapek , Timo Schneider , Jai Dayal , Manisha Gajbe , Robert Wisniewski , Torsten Hoefler