Related papers: Fast Sequential Summation Algorithms Using Augment…
Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many…
Nowadays, parallel computing is ubiquitous in several application fields, both in engineering and science. The computations rely on the floating-point arithmetic specified by the IEEE754 Standard. In this context, an elementary brick of…
Parallel batched data structures are designed to process synchronized batches of operations in a parallel computing model. In this paper, we propose parallel combining, a technique that implements a concurrent data structure from a parallel…
An emerging class of data systems partition their data and precompute approximate summaries (i.e., sketches and samples) for each segment to reduce query costs. They can then aggregate and combine the segment summaries to estimate results…
Data sketches are approximate succinct summaries of long streams. They are widely used for processing massive amounts of data and answering statistical queries about it in real-time. Existing libraries producing sketches are very fast, but…
We find a succinct expression for computing the sequence $x_t = a_t x_{t-1} + b_t$ in parallel with two prefix sums, given $t = (1, 2, \dots, n)$, $a_t \in \mathbb{R}^n$, $b_t \in \mathbb{R}^n$, and initial value $x_0 \in \mathbb{R}$. On…
We propose efficient algorithms for enumerating maximal common subsequences (MCSs) of two strings. Efficiency of the algorithms are estimated by the preprocessing-time, space, and delay-time complexities. One algorithm prepares a…
We present an acceleration method for sequences of large-scale linear systems, such as the ones arising from the numerical solution of time-dependent partial differential equations coupled with algebraic constraints. We discuss different…
Concurrent data structures serve as fundamental building blocks for concurrent computing. Many concurrent counterparts have been designed for basic sequential mechanisms; however, one notable omission is a concurrent tree that supports…
We present a novel parallelisation scheme that simplifies the adaptation of learning algorithms to growing amounts of data as well as growing needs for accurate and confident predictions in critical applications. In contrast to other…
The era of huge data necessitates highly efficient machine learning algorithms. Many common machine learning algorithms, however, rely on computationally intensive subroutines that are prohibitively expensive on large datasets. Oftentimes,…
In this report, we summarize the set partition enumeration problems and thoroughly explain the algorithms used to solve them. These algorithms iterate through the partitions in lexicographic order and are easy to understand and implement in…
Windowed recurrences are sliding window calculations where a function is applied iteratively across the window of data, and are ubiquitous throughout the natural, social, and computational sciences. In this monograph we explore the…
The aim of the paper is to introduce general techniques in order to optimize the parallel execution time of sorting on a distributed architectures with processors of various speeds. Such an application requires a partitioning step. For…
We developed a flexible parallel algorithm for graph summarization based on vertex-centric programming and parameterized message passing. The base algorithm supports infinitely many structural graph summary models defined in a formal…
Analyzing big data in a highly dynamic environment becomes more and more critical because of the increasingly need for end-to-end processing of this data. Modern data flows are quite complex and there are not efficient, cost-based,…
I present two new methods for exactly summing a set of floating-point numbers, and then correctly rounding to the nearest floating-point number. Higher accuracy than simple summation (rounding after each addition) is important in many…
Today's PCs can directly manipulate numbers not longer than 64 bits because the size of the CPU registers and the data-path are limited. Consequently, arithmetic operations such as addition, can only be performed on numbers of that length.…
Cycles are one of the fundamental subgraph patterns and being able to enumerate them in graphs enables important applications in a wide variety of fields, including finance, biology, chemistry, and network science. However, to enable cycle…
Parallel algorithms designed for simulation and performance evaluation of single-server tandem queueing systems with both infinite and finite buffers are presented. The algorithms exploit a simple computational procedure based on recursive…