English
Related papers

Related papers: ParaLog: Consistent Host-side Logging for Parallel…

200 papers

One of the major performance and scalability bottlenecks in large scientific applications is parallel reading and writing to supercomputer I/O systems. The usage of parallel file systems and consistency requirements of POSIX, that all the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-30 Steven Wei-der Chien , Stefano Markidis , Rami Karim , Erwin Laure , Sai Narasimhamurthy

Scientific applications in HPC environment are more com-plex and more data-intensive nowadays. Scientists usually rely on workflow system to manage the complexity: simply define multiple processing steps into a single script and let the…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-17 Dong Dai , Robert Ross , Dounia Khaldi , Yonghong Yan , Matthieu Dorier , Neda Tavakoli , Yong Chen

Scientific applications are often complex, irregular, and computationally-intensive. To accommodate the ever-increasing computational demands of scientific applications, high-performance computing (HPC) systems have become larger and more…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-20 Ali Mohammed , Aurelien Cavelan , Florina M. Ciorba , Ruben M. Cabezon , Ioana Banicesu

Cloud computing recently developed into a viable alternative to on-premises systems for executing high-performance computing (HPC) applications. With the emergence of new vendors and hardware options, there is now a growing need to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-14 Mohammad Mohammadi , Timur Bazhirov

While detailed resource usage monitoring is possible on the low-level using proper tools, associating such usage with higher-level abstractions in the application layer that actually cause the resource usage in the first place presents a…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-02 Joel Witzke , Ansgar Lößer , Vasilis Bountris , Florian Schintke , Björn Scheuermann

As software systems increase in complexity, conventional monitoring methods struggle to provide a comprehensive overview or identify performance issues, often missing unexpected problems. Observability, however, offers a holistic approach,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-29 Bartosz Balis , Konrad Czerepak , Albert Kuzma , Jan Meizner , Lukasz Wronski

High-performance computing platforms such as supercomputers have traditionally been designed to meet the compute demands of scientific applications. Consequently, they have been architected as producers and not consumers of data. The Apache…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-02-02 Andre Luckow , Ioannis Paraskevakos , George Chantzialexiou , Shantenu Jha

High-Performance Computing (HPC) centers and cloud providers support an increasingly diverse set of applications on heterogenous hardware. As Artificial Intelligence (AI) and Machine Learning (ML) workloads have become an increasingly…

The increasing availability of cloud computing services for science has changed the way scientific code can be developed, deployed, and run. Many modern scientific workflows are capable of running on cloud computing resources. Consequently,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-11 Peter Vaillancourt , Bennett Wineholt , Brandon Barker , Plato Deliyannis , Jackie Zheng , Akshay Suresh , Adam Brazier , Rich Knepper , Rich Wolski

Dataset storage, exchange, and access play a critical role in scientific applications. For such purposes netCDF serves as a portable and efficient file format and programming interface, which is popular in numerous scientific application…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Jianwei Li , Wei-keng Liao , Alok Choudhary , Robert Ross , Rajeev Thakur , William Gropp , Rob Latham

The ongoing convergence of HPC and cloud computing presents a fundamental challenge: HPC applications, designed for static and homogeneous supercomputers, are ill-suited for the dynamic, heterogeneous, and volatile nature of the cloud.…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-17 Aditya Bhosale , Advait Tahilyani , Laxmikant Kale , Sara Kokkila-Schumacher

In the past couple of decades, the computational abilities of supercomput- ers have increased tremendously. Leadership scale supercomputers now are capable of petaflops. Likewise, the problem size targeted by applications running on such…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-09-06 Robert Louis Cloud

High intensive computation applications can usually take days to months to finish an execution. During this time, it is common to have variations of the available resources when considering that such hardware is usually shared among a…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-01-27 Kiran Mantripragada , Alecio Binotto , Leonardo P. Tizzei

The rise of AI and the economic dominance of cloud computing have created a new nexus of innovation for high performance computing (HPC), which has a long history of driving scientific discovery. In addition to performance needs, scientific…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-10 Vanessa Sochat , Daniel Milroy , Abhik Sarkar , Aniruddha Marathe , Tapasya Patki

Modern I/O applications that run on HPC infrastructures are increasingly becoming read and metadata intensive. However, having multiple concurrent applications submitting large amounts of metadata operations can easily saturate the shared…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-27 Ricardo Macedo , Mariana Miranda , Yusuke Tanimura , Jason Haga , Amit Ruhela , Stephen Lien Harrell , Richard Todd Evans , José Pereira , João Paulo

Today's high-performance computing (HPC) systems are heavily instrumented, generating logs containing information about abnormal events, such as critical conditions, faults, errors and failures, system resource utilization, and about the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-08-24 Byung H. Park , Saurabh Hukerikar , Ryan Adamson , Christian Engelmann

Cloud computing provides a great opportunity for scientists, as it enables large-scale experiments that cannot are too long to run on local desktop machines. Cloud-based computations can be highly parallel, long running and data-intensive,…

Software Engineering · Computer Science 2016-12-07 Maria Spichkova , Heinz W. Schmidt , Ian E. Thomas , Iman I. Yusuf , Steve Androulakis , Grischa R. Meyer

This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several…

Data Structures and Algorithms · Computer Science 2013-05-01 Guillaume Aupy , Manu Shantharam , Anne Benoit , Yves Robert , Padma Raghavan

Today, many scientific and engineering areas require high performance computing to perform computationally intensive experiments. For example, many advances in transport phenomena, thermodynamics, material properties, computational…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-07-15 K. G. Kapanova , J. M. Sellier

High Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost-benefit of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-05 Marco A. S. Netto , Rodrigo N. Calheiros , Eduardo R. Rodrigues , Renato L. F. Cunha , Rajkumar Buyya
‹ Prev 1 2 3 10 Next ›