Related papers: Maximal Atomic irRedundant Sets: a Usage-based Dat…
Optimization pipelines targeting polyhedral programs try to maximize the compute throughput. Traditional approaches favor reuse and temporal locality; while the communicated volume can be low, failure to optimize spatial locality may cause…
Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature of many analysis and machine learning algorithms, however, is still a challenge for current systems. While certain types of bulk…
Execution graphs of parallel loop programs exhibit a nested, repeating structure. We show how such graphs that are the result of nested repetition can be represented by succinct parametric structures. This parametric graph template…
We introduce an efficient combination of polyhedral analysis and predicate partitioning. Template polyhedral analysis abstracts numerical variables inside a program by one polyhedron per control location, with a priori fixed directions for…
Discrete Fracture Network models are largely used for very large scale geological flow simulations. For this reason numerical methods require an investigation of tools for efficient parallel solutions on High Performance Computing systems.…
Partitioning large matrices is an important problem in distributed linear algebra computing (used in ML among others). Briefly, our goal is to perform a sequence of matrix algebra operations in a distributed manner (whenever possible) on…
Algorithms for extracting hydrologic features and properties from digital elevation models (DEMs) are challenged by large datasets, which often cannot fit within a computer's RAM. Depression filling is an important preconditioning step to…
A rising research challenge is running costly machine learning (ML) networks locally on resource-constrained edge devices. ML networks with large convolutional layers can easily exceed available memory, increasing latency due to excessive…
We present Cyclotron, a framework and compiler for using recurrence equations to express streaming dataflow algorithms, which then get portably compiled to distributed topologies of interlinked processors. Our framework provides an input…
Spatial dataflow accelerators are a promising direction for next-generation computer systems because they can reduce the memory bottlenecks of traditional von Neumann machines such as CPUs and GPUs. They organize computation around…
The paper introduces a novel algorithm for computing the output admissible set of linear discrete-time systems subject to input saturation. The proposed method takes advantage of the piecewise-affine dynamics to propagate the output…
We present a shared-memory parallelization of flow-based refinement, which is considered the most powerful iterative improvement technique for hypergraph partitioning at the moment. Flow-based refinement works on bipartitions, so current…
Several high-throughput distributed data-processing applications require multi-hop processing of streams of data. These applications include continual processing on data streams originating from a network of sensors, composing a multimedia…
Partitioning a graph into balanced blocks such that few edges run between blocks is a key problem for large-scale distributed processing. A current trend for partitioning huge graphs are streaming algorithms, which use low computational…
Edge-AI applications demand high-throughput, low-latency inference on FPGAs under tight resource and power constraints. This survey provides a comprehensive review of two key architectural decisions for FPGA-based neural network…
In this paper, we study the problem of finding an integral multiflow which maximizes the sum of flow values between every two terminals in an undirected tree with a nonnegative integer edge capacity and a set of terminals. In general, it is…
In unsplittable network flow problems, certain nodes must satisfy a combinatorial requirement that the incoming arc flows cannot be split or merged when routed through outgoing arcs. This so-called "no-split no-merge" requirement arises in…
An algorithm for irreducible decomposition of representations of finite groups over fields of characteristic zero is described. The algorithm uses the fact that the decomposition induces a partition of the invariant inner product into a…
Data flow analysis and optimization is considered for homogeneous rectangular mesh networks. We propose a flow matrix equation which allows a closed-form characterization of the nature of the minimal time solution, speedup and a simple…
Computing maximum independent sets in graphs is an important problem in computer science. In this paper, we develop an evolutionary algorithm to tackle the problem. The core innovations of the algorithm are very natural combine operations…