English
Related papers

Related papers: Optimizing Xeon Phi for Interactive Data Analysis

200 papers

With at least 50 cores, Intel Xeon Phi is a true many-core architecture. Featuring fairly powerful cores, two cache levels, and very fast interconnections, the Xeon Phi can get a theoretical peak of 1000 GFLOPs and over 240 GB/s. These…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-12-23 Jianbin Fang , Ana Lucia Varbanescu , Henk Sips , Lilun Zhang , Yonggang Che , Chuanfu Xu

Intel Xeon Phi is a recently released high-performance coprocessor which features 61 cores each supporting 4 hardware threads with 512-bit wide SIMD registers achieving a peak theoretical performance of 1Tflop/s in double precision. Many…

Performance · Computer Science 2013-02-06 Erik Saule , Kamer Kaya , Umit V. Catalyurek

The paper demonstrates the optimization of the execution environment of a hybrid OpenMP+MPI computational fluid dynamics code (shallow water equation solver) on a cluster enabled with Intel Xeon Phi coprocessors. The discussion includes:…

Mathematical Software · Computer Science 2014-08-11 Andrey Vladimirov , Cliff Addison

We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core - MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-15 George Teodoro , Tahsin Kurc , Guilherme Andrade , Jun Kong , Renato Ferreira , Joel Saltz

With the ease-of-programming, flexibility and yet efficiency, MapReduce has become one of the most popular frameworks for building big-data applications. MapReduce was originally designed for distributed-computing, and has been extended to…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-09-03 Mian Lu , Lei Zhang , Huynh Phung Huynh , Zhongliang Ong , Yun Liang , Bingsheng He , Rick Siow Mong Goh , Richard Huynh

Modern OpenMP threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two separate implementations that differ by the sharing or replication of key data structures…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-08-15 Vladimir Mironov , Yuri Alexeev , Kristopher Keipert , Michael D'mello , Alexander Moskovsky , Mark S. Gordon

Many-core accelerators, as represented by the XeonPhi coprocessors and GPGPUs, allow software to exploit spatial and temporal sharing of computing resources to improve the overall system performance. To unlock this performance potential…

Performance · Computer Science 2018-02-09 Peng Zhang , Jianbin Fang , Tao Tang , Canqun Yang , Zheng Wang

In 2013 Intel introduced the Xeon Phi, a new parallel co-processor board. The Xeon Phi is a cache-coherent many-core shared memory architecture claiming CPU-like versatility, programmability, high performance, and power efficiency. The…

Performance · Computer Science 2014-11-10 S. Ali Mirsoleimani , Aske Plaat , Jos Vermaseren , Jaap van den Herik

We investigate and characterize the performance of an important class of operations on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional spatial datasets captured by high…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-05 George Teodoro , Tahsin Kurc , Jun Kong , Lee Cooper , Joel Saltz

With the rapidly growing demand for computing power new accelerator based architectures have entered the world of high performance computing since around 5 years. In particular GPGPUs have recently become very popular, however programming…

Performance · Computer Science 2013-08-16 Volker Weinberg , Momme Allalen

Load balancing is a widely accepted technique for performance optimization of scientific applications on parallel architectures. Indeed, balanced applications do not waste processor cycles on waiting at points of synchronization and data…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-07-07 Alexey Lastovetsky , Lukasz Szustak , Roman Wyrzykowski

With the increasing size and complexity of data produced by large scale numerical simulations, it is of primary importance for scientists to be able to exploit all available hardware in heterogenous High Performance Computing environments…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-15 Timothy Dykes , Claudio Gheller , Marzia Rivi , Mel Krokos

Intel Xeon Phi many-integrated-core (MIC) architectures usher in a new era of terascale integration. Among emerging killer applications, parallel graph processing has been a critical technique to analyze connected data. In this paper, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-26 Lei Jiang , Langshi Chen , Judy Qiu

Many algorithms have been parallelized successfully on the Intel Xeon Phi coprocessor, especially those with regular, balanced, and predictable data access patterns and instruction flows. Irregular and unbalanced algorithms are harder to…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-07-17 S. Ali Mirsoleimani , Aske Plaat , Jaap van den Herik , Jos Vermaseren

We examine the Xeon Phi, which is based on Intel's Many Integrated Cores architecture, for its suitability to run the FDK algorithm--the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography.…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-01-16 Johannes Hofmann , Jan Treibig , Georg Hager , Gerhard Wellein

Given an array $\mathcal{A}$ of $n$ elements and a value $2 \leq k \leq n$, a frequent item or $k$-majority element is an element occurring in $\mathcal{A}$ more than $n/k$ times. The $k$-majority problem requires finding all of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-12 Massimo Cafaro , Marco Pulimeno , Italo Epicoco , Giovanni Aloisio

Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both…

In the push for exascale computing, energy efficiency is of utmost concern. System architectures often adopt accelerators to hasten application execution at the cost of power. The Intel Xeon Phi co-processor is unique accelerator that…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-26 Gary Lawson , Masha Sosonkina , Yuzhong Shen

In this paper we describe an autotuning tool for optimization of OpenMP applications on highly multicore and multithreaded architectures. Our work was motivated by in-depth performance analysis of scientific applications and synthetic…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-01-17 Jakub Katarzyński , Maciej Cytowski

For a deep learning model, efficient execution of its computation graph is key to achieving high performance. Previous work has focused on improving the performance for individual nodes of the computation graph, while ignoring the…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-26 Linpeng Tang , Yida Wang , Theodore L. Willke , Kai Li
‹ Prev 1 2 3 10 Next ›