Related papers: An Optimized Union-Find Algorithm for Connected Co…
This paper proposes a new parallel approach to solve connected components on a 2D binary image implemented with CUDA. We employ the following strategies to accelerate neighborhood exploration after dividing an input image into independent…
The latest generation of Timepix series hybrid pixel detectors enhance particle tracking with high spatial and temporal resolution. However, their high hit-rate capability poses challenges for data processing, particularly in multidetector…
Union-Find (or Disjoint-Set Union) is one of the fundamental problems in computer science; it has been well-studied from both theoretical and practical perspectives in the sequential case. Recently, there has been mounting interest in…
Process mapping asks to assign vertices of a task graph to processing elements of a supercomputer such that the computational workload is balanced while the communication cost is minimized. Motivated by the recent success of GPU-based graph…
Large scale-free graphs are famously difficult to process efficiently: the skewed vertex degree distribution makes it difficult to obtain balanced partitioning. Our research instead aims to turn this into an advantage by partitioning the…
We describe a method for parallelizing the lexicographic enumeration algorithm for the factorization set of an element in a numerical semigroup via bounds. This enables the use of GPU and distributed computing methods. We provide a CUDA…
Parallel algorithms on CPU and GPU are implemented for the Unified Gas-Kinetic Scheme and their performances are investigated and compared by a two dimensional channel flow case. The parallel CPU algorithm has a one dimensional block…
Large scale graph processing using distributed computing frameworks is becoming pervasive and efficient in the industry. In this work, we present a highly scalable and configurable distributed algorithm for building connected components,…
Feature matching dominates the time costs in structure from motion (SfM). The primary contribution of this study is a GPU data schedule algorithm for efficient feature matching of Unmanned aerial vehicle (UAV) images. The core idea is to…
Reducing the running time of graph algorithms is vital for tackling real-world problems such as shortest paths and matching in large-scale graphs, where path information plays a crucial role. To address this critical challenge, this paper…
The self-join finds all objects in a dataset that are within a search distance, epsilon, of each other; therefore, the self-join is a building block of many algorithms. We advance a GPU-accelerated self-join algorithm targeted towards high…
In this paper we present a new GPU-oriented mesh optimization method based on high-order finite elements. Our approach relies on node movement with fixed topology, through the Target-Matrix Optimization Paradigm (TMOP) and uses a global…
Cluster identification tasks occur in a multitude of contexts in physics and engineering such as, for instance, cluster algorithms for simulating spin models, percolation simulations, segmentation problems in image processing, or network…
Partition of unity methods (PUMs) on graphs are simple and highly adaptive auxiliary tools for graph signal processing. Based on a greedy-type metric clustering and augmentation scheme, we show how a partition of unity can be generated in…
We propose a new hybrid topology optimization algorithm based on multigrid approach that combines the parallelization strategy of CPU using OpenMP and heavily multithreading capabilities of modern Graphics Processing Units (GPU). In…
We present a set of rules to guide the design of GPU algorithms. These rules are grounded on the principle of reducing waste in GPU utility to achieve good speed up. In accordance to these rules, we propose GPU algorithms for 2D…
Matrix factorization (MF) is employed by many popular algorithms, e.g., collaborative filtering. The emerging GPU technology, with massively multicore and high intra-chip memory bandwidth but limited memory capacity, presents an opportunity…
A novel approach for the fast onboard detection of isolated markers for visual relative localisation of multiple teammates in agile UAV swarms is introduced in this paper. As the detection forms a key component of real-time localisation…
Monte Carlo Localization is a widely used approach in the field of mobile robotics. While this problem has been well studied in the 2D case, global localization in 3D maps with six degrees of freedom has so far been too computationally…
Neural embedding models are extensively employed in the table union search problem, which aims to find semantically compatible tables that can be merged with a given query table. In particular, multi-vector models, which represent a table…