Related papers: The LevelArray: A Fast, Practical Long-Lived Renam…
We consider the task of assigning unique integers to a group of processes in an asynchronous distributed system of a total of $n$ processes prone to crashes that communicate through shared read-write registers. In the Renaming problem, an…
We study the space complexity of implementing long-lived and one-shot adaptive renaming from multi-reader multi-writer registers, in an asynchronous distributed system with $n$ processes. As a result of an $f$-adaptive renaming algorithm…
Parallelization techniques have become ubiquitous for accelerating inference and training of deep neural networks. Despite this, several operations are still performed in a sequential manner. For instance, the forward and backward passes…
In plenty of data analysis tasks, a basic and time-consuming process is to produce a large number of solutions and feed them into downstream processing. Various enumeration algorithms have been developed for this purpose. An enumeration…
The aggressive application of scalar replacement to array references substantially reduces the number of memory operations at the expense of a possibly very large number of registers. In this paper we describe a register allocation…
Iterative refinement is particularly popular for numerical solution of linear systems of equations. We extend it to Low Rank Approximation of a matrix (LRA) and observe close link of the resulting algorithm to oversampling techniques,…
A matrix algorithm runs at {\em sublinear cost} if it uses much fewer memory cells and arithmetic operations than the input matrix has entries. Such algorithms are indispensable for Big Data Mining and Analysis. Quite typically in that area…
There are many challenging problems in the person re-identification (ReID) task, such as the occlusion and scale variation. Existing works usually tried to solve them by employing a one-branch network. This one-branch network needs to be…
Structured low-rank (SLR) algorithms, which exploit annihilation relations between the Fourier samples of a signal resulting from different properties, is a powerful image reconstruction framework in several applications. This scheme relies…
Serializability is a well-understood concurrency control mechanism that eases reasoning about highly-concurrent database programs. Unfortunately, enforcing serializability has a high-performance cost, especially on geographically…
In this paper we develop optimal algorithms in the binary-forking model for a variety of fundamental problems, including sorting, semisorting, list ranking, tree contraction, range minima, and ordered set union, intersection and difference.…
As neural network algorithms show high performance in many applications, their efficient inference on mobile and embedded systems are of great interests. When a single stream recurrent neural network (RNN) is executed for a personal user in…
Pipelines combining SQL-style business intelligence (BI) queries and linear algebra (LA) are becoming increasingly common in industry. As a result, there is a growing need to unify these workloads in a single framework. Unfortunately,…
Several classic problems in graph processing and computational geometry are solved via incremental algorithms, which split computation into a series of small tasks acting on shared state, which gets updated progressively. While the…
How can one quickly answer the most and top popular objects at any time, given a large log stream in a system of billions of users? It is equivalent to find the mode and top-frequent elements in a dynamic array corresponding to the log…
The primary challenge for handwriting recognition systems lies in managing long-range contextual dependencies, an issue that traditional models often struggle with. To mitigate it, attention mechanisms have recently been employed to enhance…
In computer science, sorting algorithms are crucial for data processing and machine learning. Large datasets and high efficiency requirements provide challenges for comparison-based algorithms like Quicksort and Merge sort, which achieve…
Long-range sequence modeling is a crucial aspect of natural language processing and time series analysis. However, traditional models like Recurrent Neural Networks (RNNs) and Transformers suffer from computational and memory…
A practical large language model (LLM) service may involve a long system prompt, which specifies the instructions, examples, and knowledge documents of the task and is reused across requests. However, the long system prompt causes…
Multicore architectures dominate today's processor market. Even though the number of cores and threads are pretty high and continues to grow, inherently serial algorithms do not benefit from the abundance of cores and threads. In this…