English
Related papers

Related papers: A Non-linear GPU Thread Map for Triangular Domains

200 papers

There is a stage in the GPU computing pipeline where a grid of thread-blocks is mapped to the problem domain. Normally, this grid is a k-dimensional bounding box that covers a k-dimensional problem no matter its shape. Threads that fall…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-08-27 Cristobal A. Navarro , Nancy Hitschfeld

This work proposes a new approach for mapping GPU threads onto a family of discrete embedded 2D fractals. A block-space map $\lambda: \mathbb{Z}_{\mathbb{E}}^{2} \mapsto \mathbb{Z}_{\mathbb{F}}^{2}$ is proposed, from Euclidean parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-29 Cristóbal A. Navarro , Felipe A. Quezada , Nancy Hitschfeld , Raimundo Vega , Benjamin Bustos

The study of data-parallel domain re-organization and thread-mapping techniques are relevant topics as they can increase the efficiency of GPU computations when working on spatial discrete domains with non-box-shaped geometry. In this work…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-30 Cristóbal A. Navarro , Benjamín Bustos , Nancy Hitschfeld

The problem of parallel thread mapping is studied for the case of discrete orthogonal $m$-simplices. The possibility of a $O(1)$ time recursive block-space map $\lambda: \mathbb{Z}^m \mapsto \mathbb{Z}^m$ is analyzed from the point of view…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-25 Cristóbal A. Navarro , Benjamín Bustos , Nancy Hitscheld

This work proposes a new GPU thread map for $m$-simplex domains, that scales its speedup with dimension and is energy efficient compared to other state of the art approaches. The main contributions of this work are i) the formulation of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-13 Cristóbal A. Navarro , Felipe A. Quezada , Benjamin Bustos , Nancy Hitschfeld , Rolando Kindelan

This work studies the problem of GPU thread mapping for a Sierpi\'nski gasket fractal embedded in a discrete Euclidean space of $n \times n$. A block-space map $\lambda: \mathbb{Z}_{\mathbb{E}}^{2} \mapsto \mathbb{Z}_{\mathbb{F}}^{2}$ is…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-06-15 Cristóbal A. Navarro , Benjamín Bustos , Raimundo Vega , Nancy Hitschfeld

This work presents a GPU thread mapping approach that allows doing fast parallel stencil-like computations on discrete fractals using their compact representation. The intuition behind is to employ two GPU tensor-core accelerated thread…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-26 Felipe A. Quezada , Cristóbal A. Navarro

Mapping parallel threads onto non-box-shaped domains is a known challenge in GPU computing; efficient mapping prevents performance penalties from unnecessary resource allocation. Currently, achieving this requires significant analytical…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-15 Jose Maureira , Cristóbal A. Navarro , Hector Ferrada , Luis Veas-Castillo

Spatial Branch and Bound (B&B) algorithms are widely used for solving nonconvex problems to global optimality, yet they remain computationally expensive. Though some works have been carried out to speed up B&B via CPU parallelization, GPU…

Optimization and Control · Mathematics 2025-07-29 Hongzhen Zhang , Tim Kerkenhoff , Neil Kichler , Manuel Dahmen , Alexander Mitsos , Uwe Naumann , Dominik Bongartz

GPUs have significantly accelerated first-order methods for large-scale optimization, especially in continuous optimization. However, this success has not transferred cleanly to problems with discrete variables, combinatorial structure, and…

Machine Learning · Computer Science 2026-05-22 Jiachang Liu , Andrea Lodi

Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-04 Lingda Li , Ari B. Hayes , Stephen A. Hackler , Eddy Z. Zhang , Mario Szegedy , Shuaiwen Leon Song

Massive multi-threading in GPU imposes tremendous pressure on memory subsystems. Due to rapid growth in thread-level parallelism of GPU and slowly improved peak memory bandwidth, the memory becomes a bottleneck of GPU's performance and…

Hardware Architecture · Computer Science 2019-06-17 Bing Li , Mengjie Mao , Xiaoxiao Liu , Tao Liu , Zihao Liu , Wujie Wen , Yiran Chen , Hai , Li

Fast domain propagation of linear constraints has become a crucial component of today's best algorithms and solvers for mixed integer programming and pseudo-boolean optimization to achieve peak solving performance. Irregularities in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-26 Boro Sofranac , Ambros Gleixner , Sebastian Pokutta

Bloom filters are a fundamental data structure for approximate membership queries, with applications ranging from data analytics to databases and genomics. Several variants have been proposed to accommodate parallel architectures. GPUs,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-18 Daniel Jünger , Kevin Kristensen , Yunsong Wang , Xiangyao Yu , Bertil Schmidt

The reduction of a banded matrix to bidiagonal form is a critical step in the calculation of Singular Values, a cornerstone of scientific computing and AI. Although inherently parallel, this step has traditionally been considered unsuitable…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-14 Evelyne Ringoot , Rabab Alomairy , Alan Edelman

Real-time trajectory optimization for nonlinear constrained autonomous systems is critical and typically performed by CPU-based sequential solvers. Specifically, reliance on global sparse linear algebra or the serial nature of dynamic…

Robotics · Computer Science 2026-03-13 Yilin Zou , Zhong Zhang , Maxime Robic , Fanghua Jiang

The acceleration of sparse matrix computations on modern many-core processors, such as the graphics processing units (GPUs), has been recognized and studied over a decade. Significant performance enhancements have been achieved for many…

Mathematical Software · Computer Science 2017-10-16 Ruipeng Li

In this paper, we explore the limits of graphics processors (GPUs) for general purpose parallel computing by studying problems that require highly irregular data access patterns: parallel graph algorithms for list ranking and connected…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-02-25 Frank Dehne , Kumanan Yogaratnam

Graphics Processing Units (GPUs) consisting of Streaming Multiprocessors (SMs) achieve high throughput by running a large number of threads and context switching among them to hide execution latencies. The number of thread blocks, and hence…

Hardware Architecture · Computer Science 2015-06-08 Vishwesh Jatala , Jayvant Anantpur , Amey Karkare

Range minimum queries are frequently used in string processing and database applications including biological sequence analysis, document retrieval, and web search. Hence, various data structures have been proposed for improving their…

Databases · Computer Science 2026-04-03 Lara Kreis , Justus Henneberg , Valentin Henkys , Felix Schuhknecht , Bertil Schmidt
‹ Prev 1 2 3 10 Next ›