Related papers: Distributed Ranges: A Model for Distributed Data S…
We present DASH, a C++ template library that offers distributed data structures and parallel algorithms and implements a compiler-free PGAS (partitioned global address space) approach. DASH offers many productivity and performance features…
Distributed algorithms have been playing an increasingly important role in many applications such as machine learning, signal processing, and control. Significant research efforts have been devoted to developing and analyzing new algorithms…
Distributed memory machines equipped with CPUs and GPUs (hybrid computing nodes) are hard to program because of the multiple layers of memory and heterogeneous computing configurations. In this paper, we introduce a region template…
The ability to express a program as a hierarchical composition of parts is an essential tool in managing the complexity of software and a key abstraction this provides is to separate the representation of data from the computation. Many…
We have developed a new programming framework, called Sieve, to support parallel numerical PDE algorithms operating over distributed meshes. We have also developed a reference implementation of Sieve in C++ as a library of generic…
Arrival of multicore systems has enforced a new scenario in computing, the parallel and distributed algorithms are fast replacing the older sequential algorithms, with many challenges of these techniques. The distributed algorithms provide…
Distributed model fitting refers to the process of fitting a mathematical or statistical model to the data using distributed computing resources, such that computing tasks are divided among multiple interconnected computers or nodes, often…
The parallel and distributed processing are becoming de facto industry standard, and a large part of the current research is targeted on how to make computing scalable and distributed, dynamically, without allocating the resources on…
A major driver behind the success of modern machine learning algorithms has been their ability to process ever-larger amounts of data. As a result, the use of distributed systems in both research and production has become increasingly…
We present Matrix Distributed Processing, a C++ library for fast development of efficient parallel algorithms. MDP is based on MPI and consists of a collection of C++ classes and functions such as lattice, site and field. Once an algorithm…
Motivated by the need for computationally tractable spatial methods in neuroimaging studies, we develop a distributed and integrated framework for estimation and inference of Gaussian process model parameters with ultra-high-dimensional…
As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…
This article describes a geometric partitioning software that can be used for quick computation of data partitions on many-core HPC machines. It is most suited for dynamic applications with load distributions that vary with time.…
The C/C++ memory model provides an interface and execution model for programmers of concurrent (shared-variable) code. It provides a range of mechanisms that abstract from underlying hardware memory models -- that govern how multicore…
In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points often belong to the same or similar classes, and more generally, classification rules of high accuracy…
OpenGM is a C++ template library for defining discrete graphical models and performing inference on these models, using a wide range of state-of-the-art algorithms. No restrictions are imposed on the factor graph to allow for higher-order…
Data mining algorithms are originally designed by assuming the data is available at one centralized site.These algorithms also assume that the whole data is fit into main memory while running the algorithm. But in today's scenario the data…
Parallelization is needed everywhere, from laptops and mobile phones to supercomputers. Among parallel programming models, task-based programming has demonstrated a powerful potential and is widely used in high-performance scientific…
Distributed systems can be found in various applications, e.g., in robotics or autonomous driving, to achieve higher flexibility and robustness. Thereby, data flow centric applications such as Deep Neural Network (DNN) inference benefit…
Nowadays, with the widespread of smartphones and other portable gadgets equipped with a variety of sensors, data is ubiquitous available and the focus of machine learning has shifted from being able to infer from small training samples to…