Related papers: Optimizing the MapReduce Framework on Intel Xeon P…

Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

We examine the Xeon Phi, which is based on Intel's Many Integrated Cores architecture, for its suitability to run the FDK algorithm--the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography.…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-01-16 Johannes Hofmann , Jan Treibig , Georg Hager , Gerhard Wellein

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Intel Xeon Phi is a recently released high-performance coprocessor which features 61 cores each supporting 4 hardware threads with 512-bit wide SIMD registers achieving a peak theoretical performance of 1Tflop/s in double precision. Many…

Performance · Computer Science 2013-02-06 Erik Saule , Kamer Kaya , Umit V. Catalyurek

Performance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi

In 2013 Intel introduced the Xeon Phi, a new parallel co-processor board. The Xeon Phi is a cache-coherent many-core shared memory architecture claiming CPU-like versatility, programmability, high performance, and power efficiency. The…

Performance · Computer Science 2014-11-10 S. Ali Mirsoleimani , Aske Plaat , Jos Vermaseren , Jaap van den Herik

Optimizing Xeon Phi for Interactive Data Analysis

The Intel Xeon Phi manycore processor is designed to provide high performance matrix computations of the type often performed in data analysis. Common data analysis environments include Matlab, GNU Octave, Julia, Python, and R. Achieving…

Performance · Computer Science 2019-12-03 Chansup Byun , Jeremy Kepner , William Arcand , David Bestor , William Bergeron , Matthew Hubbell , Vijay Gadepally , Michael Houle , Michael Jones , Anne Klein , Lauren Milechin , Peter Michaleas , Julie Mullen , Andrew Prout , Antonio Rosa , Siddharth Samsi , Charles Yee , Albert Reuther

An Empirical Study of Intel Xeon Phi

With at least 50 cores, Intel Xeon Phi is a true many-core architecture. Featuring fairly powerful cores, two cache levels, and very fast interconnections, the Xeon Phi can get a theoretical peak of 1000 GFLOPs and over 240 GB/s. These…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-12-23 Jianbin Fang , Ana Lucia Varbanescu , Henk Sips , Lilun Zhang , Yonggang Che , Chuanfu Xu

Efficient Hybrid Execution of C++ Applications using Intel(R) Xeon Phi(TM) Coprocessor

The introduction of Intel(R) Xeon Phi(TM) coprocessors opened up new possibilities in development of highly parallel applications. The familiarity and flexibility of the architecture together with compiler support integrated into the Intel…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-11-26 Jiri Dokulil , Enes Bajrovic , Siegfried Benkner , Sabri Pllana , Martin Sandrieser , Beverly Bachmayer

Splotch: porting and optimizing for the Xeon Phi

With the increasing size and complexity of data produced by large scale numerical simulations, it is of primary importance for scientists to be able to exploit all available hardware in heterogenous High Performance Computing environments…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-15 Timothy Dykes , Claudio Gheller , Marzia Rivi , Mel Krokos

First experiences with the Intel MIC architecture at LRZ

With the rapidly growing demand for computing power new accelerator based architectures have entered the world of high performance computing since around 5 years. In particular GPGPUs have recently become very popular, however programming…

Performance · Computer Science 2013-08-16 Volker Weinberg , Momme Allalen

Tuning Streamed Applications on Intel Xeon Phi: A Machine Learning Based Approach

Many-core accelerators, as represented by the XeonPhi coprocessors and GPGPUs, allow software to exploit spatial and temporal sharing of computing resources to improve the overall system performance. To unlock this performance potential…

Performance · Computer Science 2018-02-09 Peng Zhang , Jianbin Fang , Tao Tang , Canqun Yang , Zheng Wang

Cluster-level tuning of a shallow water equation solver on the Intel MIC architecture

The paper demonstrates the optimization of the execution environment of a hybrid OpenMP+MPI computational fluid dynamics code (shallow water equation solver) on a cluster enabled with Intel Xeon Phi coprocessors. The discussion includes:…

Mathematical Software · Computer Science 2014-08-11 Andrey Vladimirov , Cliff Addison

Breadth First Search Vectorization on the Intel Xeon Phi

Breadth First Search (BFS) is a building block for graph algorithms and has recently been used for large scale analysis of information in a variety of applications including social networks, graph databases and web searching. Due to its…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-12 Mireya Paredes , Graham Riley , Mikel Lujan

Performance Analysis and Efficient Execution on Systems with multi-core CPUs, GPUs and MICs

We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core - MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-15 George Teodoro , Tahsin Kurc , Guilherme Andrade , Jun Kong , Renato Ferreira , Joel Saltz

Evaluating kernels on Xeon Phi to accelerate Gysela application

This work describes the challenges presented by porting parts ofthe Gysela code to the Intel Xeon Phi coprocessor, as well as techniques used for optimization, vectorization and tuning that can be applied to other applications. We evaluate…

Computational Physics · Physics 2015-08-04 G. Latu , M. Haefele , J. Bigot , V. Grandgirard , T. Cartier-Michaud , F. Rozar

Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture

Intel Xeon Phi many-integrated-core (MIC) architectures usher in a new era of terascale integration. Among emerging killer applications, parallel graph processing has been a critical technique to analyze connected data. In this paper, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-26 Lei Jiang , Langshi Chen , Judy Qiu

Accelerating HPC codes on Intel(R) Omni-Path Architecture networks: From particle physics to Machine Learning

We discuss practical methods to ensure near wirespeed performance from clusters with either one or two Intel(R) Omni-Path host fabric interfaces (HFI) per node, and Intel(R) Xeon Phi(TM) 72xx (Knight's Landing) processors, and using the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-15 Peter Boyle , Michael Chuvelev , Guido Cossu , Christopher Kelly , Christoph Lehner , Lawrence Meadows

Towards Modeling Energy Consumption of Xeon Phi

In the push for exascale computing, energy efficiency is of utmost concern. System architectures often adopt accelerators to hasten application execution at the cost of power. The Intel Xeon Phi co-processor is unique accelerator that…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-26 Gary Lawson , Masha Sosonkina , Yuzhong Shen

Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures

We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core IntelR architectures.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-27 Fabio Baruffa , Luigi Iapichino , Nicolay J. Hammer , Vasileios Karakasis

The Family of MapReduce and Large Scale Data Processing Systems

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a…

Databases · Computer Science 2013-02-14 Sherif Sakr , Anna Liu , Ayman G. Fayoumi

Comparative Performance Analysis of Intel Xeon Phi, GPU, and CPU

We investigate and characterize the performance of an important class of operations on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional spatial datasets captured by high…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-05 George Teodoro , Tahsin Kurc , Jun Kong , Lee Cooper , Joel Saltz

An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel Xeon Phi processor

Modern OpenMP threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two separate implementations that differ by the sharing or replication of key data structures…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-08-15 Vladimir Mironov , Yuri Alexeev , Kristopher Keipert , Michael D'mello , Alexander Moskovsky , Mark S. Gordon