English
Related papers

Related papers: Vector operations for accelerating expensive Bayes…

200 papers

Computational intensity and sequential nature of estimation techniques for Bayesian methods in statistics and machine learning, combined with their increasing applications for big data analytics, necessitate both the identification of…

Computation · Statistics 2015-03-02 Alireza S. Mahani , Mansour T. A. Sharabiani

This paper introduces a fast Central Processing Unit (CPU) implementation of geodesic morphological operations using stream processing. In contrast to the current state-of-the-art, that focuses on achieving insensitivity to the filter sizes…

Performance · Computer Science 2019-12-02 Danijel Žlaus , Domen Mongus

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2)…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Xinyao Yi

With high computation power and memory bandwidth, graphics processing units (GPUs) lend themselves to accelerate data-intensive analytics, especially when such applications fit the single instruction multiple data (SIMD) model. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-12 Hang Liu , H. Howie Huang

SSE (streaming SIMD extensions) and AVX (advanced vector extensions) are SIMD (single instruction multiple data streams) instruction sets supported by recent CPUs manufactured in Intel and AMD. This SIMD programming allows parallel…

High Energy Physics - Lattice · Physics 2013-11-05 Hwancheol Jeong , Sunghoon Kim , Weonjong Lee , Seok-Ho Myung

Large-number arithmetic, widely used in scientific computing and cryptography, has seen limited adoption of single instruction, multiple data (SIMD) parallelism on modern CPUs due to the inherent dependencies in traditional algorithms. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-27 Subhrajit Das , Abhishek Bichhawat , Yuvraj Patel

Generalised matrix-matrix multiplication forms the kernel of many mathematical algorithms. A faster matrix-matrix multiply immediately benefits these algorithms. In this paper we implement efficient matrix multiplication for large matrices…

Performance · Computer Science 2019-12-11 Douglas Aberdeen , Jonathan Baxter

Geospatial Processing, such as queries based on point-to-polyline shortest distance and point-in-polygon test, are fundamental to many scientific and engineering applications, including post-processing large-scale environmental and climate…

Databases · Computer Science 2014-03-05 Jianting Zhang Simin You

We examine the problem of optimizing classification tree evaluation for on-line and real-time applications by using GPUs. Looking at trees with continuous attributes often used in image segmentation, we first put the existing algorithms for…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-11-08 Jason Spencer

Two essential problems in Computer Algebra, namely polynomial factorization and polynomial greatest common divisor computation, can be efficiently solved thanks to multiple polynomial evaluations in two variables using modular arithmetic.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-27 Pierre Fortin , Ambroise Fleury , François Lemaire , Michael Monagan

Modern processors have instructions to process 16 bytes or more at once. These instructions are called SIMD, for single instruction, multiple data. Recent advances have leveraged SIMD instructions to accelerate parsing of common Internet…

Data Structures and Algorithms · Computer Science 2025-06-05 Daniel Lemire

The prefix sum operation is a useful primitive with a broad range of applications. For database systems, it is a building block of many important operators including join, sort and filter queries. In this paper, we study different methods…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-12-25 Wangda Zhang , Yanbin Wang , Kenneth A. Ross

The introduction of Single Instruction Multiple Data (SIMD) instructions in mainstream CPUs has enabled modern database engines to leverage data parallelism by performing more computation with a single instruction, resulting in a reduced…

Databases · Computer Science 2023-12-27 Yeasir Rayhan , Walid G. Aref

This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization…

Materials Science · Physics 2017-09-13 Chris M. Mangiardi , Ralf Meyer

Matrix-multiplication units (MXUs) are now prevalent in every computing platform. The key attribute that makes MXUs so successful is the semiring structure, which allows tiling for both parallelism and data reuse. Nonetheless,…

Hardware Architecture · Computer Science 2022-09-02 Yunan Zhang , Po-An Tsai , Hung-Wei Tseng

Compression algorithms are important for data oriented tasks, especially in the era of Big Data. Modern processors equipped with powerful SIMD instruction sets, provide us an opportunity for achieving better compression performance.…

Information Retrieval · Computer Science 2015-04-15 Wayne Xin Zhao , Xudong Zhang , Daniel Lemire , Dongdong Shan , Jian-Yun Nie , Hongfei Yan , Ji-Rong Wen

Multiple-precision floating-point branch-free algorithms can significantly accelerate multi-component arithmetic implemented by combining hardware-based binary64 and binary32, particularly for triple- and quadruple-precision computations.…

Mathematical Software · Computer Science 2026-05-08 Tomonori Kouya

Single Instruction, Multiple Data (SIMD) vectorization is a major driver of performance in current architectures, and is mandatory for achieving good performance with codes that are limited by instruction throughput. We investigate the…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-01-30 Johannes Hofmann , Jan Treibig , Georg Hager , Gerhard Wellein

Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable the full adoption of processing-using-DRAM, it is necessary to provide support for more complex…

We design and implement parallel prefix sum (scan) algorithms using Ascend AI accelerators. Ascend accelerators feature specialized computing units: the cube units for efficient matrix multiplication and the vector units for optimized…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-05 Bartłomiej Wróblewski , Gioele Gottardo , Anastasios Zouzias
‹ Prev 1 2 3 10 Next ›