English
Related papers

Related papers: Application-aware Congestion Mitigation for High-P…

200 papers

Network congestion in high-speed interconnects is a major source of application run time performance variation. Recent years have witnessed a surge of interest from both academia and industry in the development of novel approaches for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-12 Saurabh Jha , Archit Patke , Jim Brandt , Ann Gentile , Mike Showerman , Eric Roman , Zbigniew T. Kalbarczyk , William T. Kramer , Ravishankar K. Iyer

Efficient data access in High-Performance Computing (HPC) systems is essential to the performance of intensive computing tasks. Traditional optimizations of the I/O stack aim to improve peak performance but are often workload specific and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-21 Thomas Collignon , Kouds Halitim , Raphaël Bleuse , Sophie Cerf , Bogdan Robu , Éric Rutten , Lionel Seinturier , Alexandre van Kempen

High-performance computing (HPC) systems increasingly support both scalable AI training and large-scale simulation workloads. Both typically rely heavily on collective communication operations. On modern supercomputers, however, network…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-14 Lorenzo Piarulli , Marco Faltelli , Dirk Pleiter , Karthee Sivalingam , Dancheng Zhang , Kexue Zhao , Matteo Turisini , Francesco Iannone , Aldo Artigiani , Daniele De Sensi

System noise can negatively impact the performance of HPC systems, and the interconnection network is one of the main factors contributing to this problem. To mitigate this effect, adaptive routing sends packets on non-minimal paths if they…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-18 Daniele De Sensi , Salvatore Di Girolamo , Torsten Hoefler

Heterogeneity has grown in popularity both at the core and server level as a way to improve both performance and energy efficiency. However, despite these benefits, scheduling applications in heterogeneous machines remains challenging.…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-19 Francisco Romero , Christina Delimitrou

Congestion in network occurs due to exceed in aggregate demand as compared to the accessible capacity of the resources. Network congestion will increase as network speed increases and new effective congestion control methods are needed,…

Networking and Internet Architecture · Computer Science 2009-12-08 Shakeel Ahmad , Adli Mustafa , Bashir Ahmad , Arjamand Bano , Al-Sammarraie Hosam

This paper describes the implementation and evaluation of an operating system module, the Congestion Manager (CM), which provides integrated network flow management and exports a convenient programming interface that allows applications to…

Networking and Internet Architecture · Computer Science 2007-05-23 David G. Andersen , Deepak Bansal , Dorothy Curtis , Srinivasan Seshan , Hari Balakrishnan

The interconnection network is a crucial subsystem in High-Performance Computing clusters and Data-centers, guaranteeing high bandwidth and low latency to the applications' communication operations. Unfortunately, congestion situations may…

Networking and Internet Architecture · Computer Science 2025-02-04 Jose Rocher-Gonzalez , Jesus Escudero-Sahuquillo , Pedro J. Garcia , Francisco J. Quiles

The demand for computer in our daily lives has led to the proliferation of Datacenters that power indispensable many services. On the other hand, computing has become essential for some research for various scientific fields, that require…

High-performance computing (HPC) centers consume substantial power, incurring environmental and operational costs. This review assesses how artificial intelligence (AI), including machine learning (ML) and optimization, improves the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-03 Pierrick Pochelu , Hyacinthe Cartiaux , Julien Schleich

Nowadays, the bulk of Internet traffic uses TCP protocol for reliable transmission. But the standard TCP's performance is very poor in High Speed Networks (HSN) and hence the core gigabytes links are usually underutilization. This problem…

Networking and Internet Architecture · Computer Science 2021-03-18 Shahram Jamali , Mir Mahmoud Talebi , Reza Fotohi

In this paper, we reveal the relationship between entropy rate and the congestion in complex network and solve it analytically for special cases. Finding maximizing entropy rate will lead to an improvement of traffic efficiency, we propose…

Physics and Society · Physics 2017-09-15 Yuhang Fan , Hanyuan Liu , Shibo He

The emergence of large-scale AI models, like GPT-4, has significantly impacted academia and industry, driving the demand for high-performance computing (HPC) to accelerate workloads. To address this, we present HPCClusterScape, a…

Human-Computer Interaction · Computer Science 2023-12-22 Heungseok Park , Aeree Cho , Hyojun Jeon , Hayoung Lee , Youngil Yang , Sungjae Lee , Heungsub Lee , Jaegul Choo

Accurate latency computation is essential for the Internet of Things (IoT) since the connected devices generate a vast amount of data that is processed on cloud infrastructure. However, the cloud is not an optimal solution. To overcome this…

Networking and Internet Architecture · Computer Science 2023-11-03 Alzahraa Elsayed , Khalil Mohamed , Hany Harb

Recent work has initiated the study of dense graph processing using graph sketching methods, which drastically reduce space costs by lossily compressing information about the input graph. In this paper, we explore the strange and surprising…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-18 David Tench , Evan T. West , Kenny Zhang , Michael Bender , Daniel DeLayo , Martin Farach-Colton , Gilvir Gill , Tyler Seip , Victor Zhang

High intensive computation applications can usually take days to months to finish an execution. During this time, it is common to have variations of the available resources when considering that such hardware is usually shared among a…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-01-27 Kiran Mantripragada , Alecio Binotto , Leonardo P. Tizzei

In heterogeneous networks, achieving congestion avoidance is difficult because the congestion feedback from one subnetwork may have no meaning to source on other other subnetworks. We propose using changes in round-trip delay as an implicit…

Networking and Internet Architecture · Computer Science 2007-05-23 R. Jain

Training neural network often uses a machine learning framework such as TensorFlow and Caffe2. These frameworks employ a dataflow model where the NN training is modeled as a directed graph composed of a set of nodes. Operations in neural…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-20 Jiawen Liu , Dong Li , Gokcen Kestor , Jeffrey Vetter

Increasingly stringent throughput and latency requirements in datacenter networks demand fast and accurate congestion control. We observe that the reaction time and accuracy of existing datacenter congestion control schemes are inherently…

Networking and Internet Architecture · Computer Science 2021-12-30 Vamsi Addanki , Oliver Michel , Stefan Schmid

The shift towards high-bandwidth networks driven by AI workloads in data centers and HPC clusters has unintentionally aggravated network latency, adversely affecting the performance of communication-intensive HPC applications. As…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-23 Siyuan Shen , Langwen Huang , Marcin Chrapek , Timo Schneider , Jai Dayal , Manisha Gajbe , Robert Wisniewski , Torsten Hoefler
‹ Prev 1 2 3 10 Next ›