English
Related papers

Related papers: Array Program Transformation with Loo.py by Exampl…

200 papers

A large amount of numerically-oriented code is written and is being written in legacy languages. Much of this code could, in principle, make good use of data-parallel throughput-oriented computer architectures. Loo.py, a…

Programming Languages · Computer Science 2015-05-19 Andreas Klöckner

Today's highly heterogeneous computing landscape places a burden on programmers wanting to achieve high performance on a reasonably broad cross-section of machines. To do so, computations need to be expressed in many different but…

Programming Languages · Computer Science 2014-06-02 Andreas Klöckner

Emerging GPU architectures for high performance computing are well suited to a data-parallel programming model. This paper presents preliminary work examining a programming methodology that provides Fortran programmers with access to these…

Programming Languages · Computer Science 2011-07-13 Matthew J. Sottile , Craig E Rasmussen , Wayne N. Weseloh , Robert W. Robey , Daniel Quinlan , Jeffrey Overbey

Many tools used to process programs, like compilers, analyzers, or verifiers, perform transformations on their intermediate program representation, like abstract syntax trees. Implementing such program transformations is a non-trivial task,…

Programming Languages · Computer Science 2026-01-21 Michael Hanus , Steven Libby

Array-intensive programs are often amenable to parallelization across many cores on a single machine as well as scaling across multiple machines and hence are well explored, especially in the domain of high-performance computing. These…

Programming Languages · Computer Science 2019-05-23 Kunal Banerjee , Chandan Karfa

Algorithmic skeletons are used as building-blocks to ease the task of parallel programming by abstracting the details of parallel implementation from the developer. Most existing libraries provide implementations of skeletons that are…

Programming Languages · Computer Science 2016-07-11 Venkatesh Kannan , G. W. Hamilton

Pretrained Transformers achieve state-of-the-art performance in various code-processing tasks but may be too large to be deployed. As software development tools often incorporate modules for various purposes which may potentially use a…

Computation and Language · Computer Science 2022-12-13 Shamil Ayupov , Nadezhda Chirkova

We present a technique for automatically transforming kernel-based computations in disparate, nested loops into a fused, vectorized form that can reduce intermediate storage needs and lead to improved performance on contemporary hardware.…

Performance · Computer Science 2017-10-25 Jason Sewall , Simon J. Pennycook

An additive fast Fourier transform over a finite field of characteristic two efficiently evaluates polynomials at every element of an $\mathbb{F}_2$-linear subspace of the field. We view these transforms as performing a change of basis from…

Symbolic Computation · Computer Science 2018-07-23 Nicholas Coxon

The current trends in next-generation exascale systems go towards integrating a wide range of specialized (co-)processors into traditional supercomputers. Due to the efficiency of heterogeneous systems in terms of Watts and FLOPS per…

Programming Languages · Computer Science 2017-01-26 Guillermo Vigueras , Manuel Carro , Salvador Tamarit , Julio Mariño

Graphics processing units (GPU) had evolved from a specialized hardware capable to render high quality graphics in games to a commodity hardware for effective processing blocks of data in a parallel schema. This evolution is particularly…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-03-26 Luis Cabellos

We present a mechanism to symbolically gather performance-relevant operation counts from numerically-oriented subprograms (`kernels') expressed in the Loopy programming system, and apply these counts in a simple, linear model of kernel run…

Performance · Computer Science 2016-04-19 James Stevens , Andreas Klöckner

We present a versatile GPU-based parallel version of Logistic Regression (LR), aiming to address the increasing demand for faster algorithms in binary classification due to large data sets. Our implementation is a direct translation of the…

Machine Learning · Computer Science 2023-08-22 Nechba Mohammed , Mouhajir Mohamed , Sedjari Yassine

The application of program transformation and algebraic methods to the development of efficient combinatorial optimization (CO) algorithms relies on an exhaustive combinatorial generator for the problem specification, followed by the fusion…

Discrete Mathematics · Computer Science 2026-05-29 Xi He , Max. A. Little

Spatial computing architectures promise a major stride in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-24 Johannes de Fine Licht , Maciej Besta , Simon Meierhans , Torsten Hoefler

We present a systematic, algebraically based, design methodology for efficient implementation of computer programs optimized over multiple levels of the processor/memory and network hierarchy. Using a common formalism to describe the…

Mathematical Software · Computer Science 2008-03-18 Lenore R. Mullin , James E. Raynolds

We introduce a high-performance virtual machine (VM) written in a numerically fast language like Fortran or C to evaluate very large expressions. We discuss the general concept of how to perform computations in terms of a VM and present…

Computational Physics · Physics 2015-09-22 Bijan Chokoufe Nejad , Thorsten Ohl , Jürgen Reuter

Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-18 Mathis Bouverot-Dupuis , Mary Sheeran

All major weather and climate applications are currently developed using languages such as Fortran or C++. This is typical in the domain of high performance computing (HPC), where efficient execution is an important concern. Unfortunately,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-15 Enrique G. Paredes , Linus Groner , Stefano Ubbiali , Hannes Vogt , Alberto Madonna , Kean Mariotti , Felipe Cruz , Lucas Benedicic , Mauro Bianco , Joost VandeVondele , Thomas C. Schulthess

Modern polyhedral compilers excel at aggressively optimizing codes with static control parts, but the state-of-practice to find high-performance polyhedral transformations especially for different hardware targets still largely involves…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-10 Martin Kong , Louis-Noël Pouchet
‹ Prev 1 2 3 10 Next ›