Related papers: Mapping Matters: Application Process Mapping on 3-…
Performance of supercomputer depends on the quality of resource manager, one of its functions is assignment of jobs to the nodes of clusters or MPP computers. Parts of parallel programs interact with each other with different intensity, and…
Nowadays the number of available processing cores within computing nodes which are used in recent clustered environments, are growing up with a rapid rate. Despite this trend, the number of available network interfaces in such computing…
Several high-throughput distributed data-processing applications require multi-hop processing of streams of data. These applications include continual processing on data streams originating from a network of sensors, composing a multimedia…
Modern large-scale scientific applications consist of thousands to millions of individual tasks. These tasks involve not only computation but also communication with one another. Typically, the communication pattern between tasks is sparse…
In this paper we consider the operator mapping problem for in-network stream processing applications. In-network stream processing consists in applying a tree of operators in steady-state to multiple data objects that are continually…
HPC systems keep growing in size to meet the ever-increasing demand for performance and computational resources. Apart from increased performance, large scale systems face two challenges that hinder further growth: energy efficiency and…
Executing multiple applications on a single MPSoC brings the major challenge of satisfying multiple quality requirements regarding real-time, energy, etc. Hybrid application mapping denotes the combination of design-time analysis with…
Mapping applications onto heterogeneous platforms is a difficult challenge, even for simple application patterns such as pipeline graphs. The problem is even more complex when processors are subject to failure during the execution of the…
We propose DFModel, a modeling framework for mapping dataflow computation graphs onto large-scale systems. Mapping a workload to a system requires optimizing dataflow mappings at various levels, including the inter-chip (between chips)…
Communication and topology aware process mapping is a powerful approach to reduce communication time in parallel applications with known communication patterns on large, distributed memory systems. We address the problem as a quadratic…
Static mapping is the assignment of parallel processes to the processing elements (PEs) of a parallel system, where the assignment does not change during the application's lifetime. In our scenario we model an application's computations and…
Network-on-Chips (NoCs) have been widely employed in the design of multiprocessor system-on-chips (MPSoCs) as a scalable communication solution. NoCs enable communications between on-chip Intellectual Property (IP) cores and allow those…
High throughput is of particular interest in data center and HPC networks. Although myriad network topologies have been proposed, a broad head-to-head comparison across topologies and across traffic patterns is absent, and the right way to…
Obtaining optimal data transfer performance is of utmost importance to today's data-intensive distributed applications and wide-area data replication services. Doing so necessitates effectively utilizing available network bandwidth and…
Process mapping asks to assign vertices of a task graph to processing elements of a supercomputer such that the computational workload is balanced while the communication cost is minimized. Motivated by the recent success of GPU-based graph…
Achieving peak performance in a computer system requires optimizations in every layer of the system, be it hardware or software. A detailed understanding of the underlying hardware, and especially the processor, is crucial to optimize…
The construction of Mapper has emerged in the last decade as a powerful and effective topological data analysis tool that approximates and generalizes other topological summaries, such as the Reeb graph, the contour tree, split, and joint…
This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several…
Modern embedded computing platforms consist of a high amount of heterogeneous resources, which allows executing multiple applications on a single device. The number of running application on the system varies with time and so does the…
Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a frequently needed operation when processing graphs on a parallel computer. When a topology of a distributed system is known an important task…