Related papers: Locally-Oriented Programming: A Simple Programming…

A Generic Library for Stencil Computations

In this era of diverse and heterogeneous computer architectures, the programmability issues, such as productivity and portable efficiency, are crucial to software development and algorithm design. One way to approach the problem is to step…

Mathematical Software · Computer Science 2012-07-10 Mauro Bianco , Ugo Varetto

Mapping Stencils on Coarse-grained Reconfigurable Spatial Architecture

Stencils represent a class of computational patterns where an output grid point depends on a fixed shape of neighboring points in an input grid. Stencil computations are prevalent in scientific applications engaging a significant portion of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-24 Jesmin Jahan Tithi , Fabrizio Petrini , Hongbo Rong , Andrei Valentin , Carl Ebeling

Evaluation of Programming Models and Performance for Stencil Computation on Current GPU Architectures

Accelerated computing is widely used in high-performance computing. Therefore, it is crucial to experiment and discover how to better utilize GPUGPUs latest generations on relevant applications. In this paper, we present results and share…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-13 Baodi Shan , Mauricio Araya-Polo

StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems

Spatial computing devices have been shown to significantly accelerate stencil computations, but have so far relied on unrolling the iterative dimension of a single stencil operation to increase temporal locality. This work considers the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-12 Johannes de Fine Licht , Andreas Kuster , Tiziano De Matteis , Tal Ben-Nun , Dominic Hofer , Torsten Hoefler

HaoCL: Harnessing Large-scale Heterogeneous Processors Made Easy

The pervasive adoption of Deep Learning (DL) and Graph Processing (GP) makes it a de facto requirement to build large-scale clusters of heterogeneous accelerators including GPUs and FPGAs. The OpenCL programming framework can be used on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-19 Yao Chen , Xin Long , Jiong He , Yuhang Chen , Hongshi Tan , Zhenxiang Zhang , Marianne Winslett , Deming Chen

Stencil-HMLS: A multi-layered approach to the automatic optimisation of stencil codes on FPGA

The challenges associated with effectively programming FPGAs have been a major blocker in popularising reconfigurable architectures for HPC workloads. However new compiler technologies, such as MLIR, are providing new capabilities which…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-04 Gabriel Rodriguez-Canal , Nick Brown , Maurice Jamieson , Emilien Bauer , Anton Lydike , Tobias Grosser

Local Learning with Neuron Groups

Traditional deep network training methods optimize a monolithic objective function jointly for all the components. This can lead to various inefficiencies in terms of potential parallelization. Local learning is an approach to…

Machine Learning · Computer Science 2023-01-19 Adeetya Patel , Michael Eickenberg , Eugene Belilovsky

LOCAL: Low-Complex Mapping Algorithm for Spatial DNN Accelerators

Deep neural networks are a promising solution for applications that solve problems based on learning data sets. DNN accelerators solve the processing bottleneck as a domain-specific processor. Like other hardware solutions, there must be…

Hardware Architecture · Computer Science 2022-11-08 Midia Reshadi , David Gregg

Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory

New algorithms and optimization techniques are needed to balance the accelerating trend towards bandwidth-starved multicore chips. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the…

Performance · Computer Science 2012-03-01 Markus Wittmann , Georg Hager , Gerhard Wellein

A parallel pattern for iterative stencil + reduce

We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce,…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-09-16 M. Aldinucci , M. Danelutto , M. Drocco , P. Kilpatrick , C. Misale , G. Peretti Pezzi , M. Torquati

A Portable Framework for Accelerating Stencil Computations on Modern Node Architectures

Finite-difference methods based on high-order stencils are widely used in seismic simulations, weather forecasting, computational fluid dynamics, and other scientific applications. Achieving HPC-level stencil computations on one…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-09 Ryuichi Sai , John Mellor-Crummey , Jinfan Xu , Mauricio Araya-Polo

pocl: A Performance-Portable OpenCL Implementation

OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-23 Pekka Jääskeläinen , Carlos Sánchez de La Lama , Erik Schnetter , Kalle Raiskila , Jarmo Takala , Heikki Berg

What Can be Observed Locally? Round-based Models for Quantum Distributed Computing

Recently, several claims have been made that certain fundamental problems of distributed computing, including Leader Election and Distributed Consensus, begin to admit feasible and efficient solutions when the model of distributed…

Quantum Physics · Physics 2009-03-09 Cyril Gavoille , Adrian Kosowski , Marcin Markiewicz

A Unified View of Localized Kernel Learning

Multiple Kernel Learning, or MKL, extends (kernelized) SVM by attempting to learn not only a classifier/regressor but also the best kernel for the training task, usually from a combination of existing kernel functions. Most MKL methods seek…

Machine Learning · Computer Science 2016-03-07 John Moeller , Sarathkrishna Swaminathan , Suresh Venkatasubramanian

Stencil Matrixization

Current architectures are now equipped with matrix computation units designed to enhance AI and high-performance computing applications. Within these architectures, two fundamental instruction types are matrix multiplication and vector…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-04 Wenxuan Zhao , Liang Yuan , Baicheng Yan , Penghao Ma , Yunquan Zhang , Long Wang , Zhe Wang

Beyond 16GB: Out-of-Core Stencil Computations

Stencil computations are a key class of applications, widely used in the scientific computing community, and a class that has particularly benefited from performance improvements on architectures with high memory bandwidth. Unfortunately,…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-27 Istvan Z Reguly , Gihan R Mudalige , Michael B Giles

Scalable Block-Diagonal Locality-Constrained Projective Dictionary Learning

We propose a novel structured discriminative block-diagonal dictionary learning method, referred to as scalable Locality-Constrained Projective Dictionary Learning (LC-PDL), for efficient representation and classification. To improve the…

Computer Vision and Pattern Recognition · Computer Science 2019-05-28 Zhao Zhang , Weiming Jiang , Zheng Zhang , Sheng Li , Guangcan Liu , Jie Qin

Localized Multiple Kernel Learning---A Convex Approach

We propose a localized approach to multiple kernel learning that can be formulated as a convex optimization problem over a given cluster structure. For which we obtain generalization error guarantees and derive an optimization algorithm…

Machine Learning · Computer Science 2016-10-14 Yunwen Lei , Alexander Binder , Ürün Dogan , Marius Kloft

A locality-based approach for coded computation

Modern distributed computation infrastructures are often plagued by unavailabilities such as failing or slow servers. These unavailabilities adversely affect the tail latency of computation in distributed infrastructures. The simple…

Information Theory · Computer Science 2020-02-07 Michael Rudow , K. V. Rashmi , Venkatesan Guruswami

Distributed Computing in the Asynchronous LOCAL model

The LOCAL model is among the main models for studying locality in the framework of distributed network computing. This model is however subject to pertinent criticisms, including the facts that all nodes wake up simultaneously, perform in…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-09 Carole Delporte-Gallet , Hugues Fauconnier , Pierre Fraigniaud , Mikaël Rabie