Related papers: DPP-PMRF: Rethinking Optimization for a Probabilis…

Partially Conditioned Patch Parallelism for Accelerated Diffusion Model Inference

Diffusion models have exhibited exciting capabilities in generating images and are also very promising for video creation. However, the inference speed of diffusion models is limited by the slow sampling process, restricting its use cases.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-05 XiuYu Zhang , Zening Luo , Michelle E. Lu

An Evaluation of Massively Parallel Algorithms for DFA Minimization

We study parallel algorithms for the minimization of Deterministic Finite Automata (DFAs). In particular, we implement four different massively parallel algorithms on Graphics Processing Units (GPUs). Our results confirm the expectations…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-31 Jan Martens , Anton Wijs

Accelerating Exact and Approximate Inference for (Distributed) Discrete Optimization with GPUs

Discrete optimization is a central problem in artificial intelligence. The optimization of the aggregated cost of a network of cost functions arises in a variety of problems including (W)CSP, DCOP, as well as optimization in stochastic…

Artificial Intelligence · Computer Science 2018-01-12 Ferdinando Fioretto , Enrico Pontelli , William Yeoh , Rina Dechter

Partition-Merge: Distributed Inference and Modularity Optimization

This paper presents a novel meta algorithm, Partition-Merge (PM), which takes existing centralized algorithms for graph computation and makes them distributed and faster. In a nutshell, PM divides the graph into small subgraphs using our…

Data Structures and Algorithms · Computer Science 2013-09-25 Vincent Blondel , Kyomin Jung , Pushmeet Kohli , Devavrat Shah

ParaGraph: Weighted Graph Representation for Performance Optimization of HPC Kernels

GPU-based HPC clusters are attracting more scientific application developers due to their extensive parallelism and energy efficiency. In order to achieve portability among a variety of multi/many core architectures, a popular choice for an…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-10 Ali TehraniJamsaz , Alok Mishra , Akash Dutta , Abid M. Malik , Barbara Chapman , Ali Jannesari

Dynamic Probabilistic Pruning: A general framework for hardware-constrained pruning at different granularities

Unstructured neural network pruning algorithms have achieved impressive compression rates. However, the resulting - typically irregular - sparse matrices hamper efficient hardware implementations, leading to additional memory usage and…

Machine Learning · Computer Science 2021-05-27 Lizeth Gonzalez-Carabarin , Iris A. M. Huijben , Bastiaan S. Veeling , Alexandre Schmid , Ruud J. G. van Sloun

Probabilistic Graphical Models on Multi-Core CPUs using Java 8

In this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic…

Artificial Intelligence · Computer Science 2017-07-10 Andres R. Masegosa , Ana M. Martinez , Hanen Borchani

Performance Acceleration of Kernel Polynomial Method Applying Graphics Processing Units

The Kernel Polynomial Method (KPM) is one of the fast diagonalization methods used for simulations of quantum systems in research fields of condensed matter physics and chemistry. The algorithm has a difficulty to be parallelized on a…

Computational Physics · Physics 2011-05-30 Shixun Zhang , Shinichi Yamagiwa , Masahiko Okumura , Seiji Yunoki

Parareal Neural Networks Emulating a Parallel-in-time Algorithm

As deep neural networks (DNNs) become deeper, the training time increases. In this perspective, multi-GPU parallel computing has become a key tool in accelerating the training of DNNs. In this paper, we introduce a novel methodology to…

Numerical Analysis · Mathematics 2024-07-08 Chang-Ock Lee , Youngkyu Lee , Jongho Park

Effective GPU Parallelization of Distributed and Localized Model Predictive Control

To effectively control large-scale distributed systems online, model predictive control (MPC) has to swiftly solve the underlying high-dimensional optimization. There are multiple techniques applied to accelerate the solving process in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-30 Carmen Amo Alonso , Shih-Hao Tseng

Exploring the Limits of GPUs With Parallel Graph Algorithms

In this paper, we explore the limits of graphics processors (GPUs) for general purpose parallel computing by studying problems that require highly irregular data access patterns: parallel graph algorithms for list ranking and connected…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-02-25 Frank Dehne , Kumanan Yogaratnam

A parallel evolutionary algorithm to optimize dynamic memory managers in embedded systems

For the last thirty years, several Dynamic Memory Managers (DMMs) have been proposed. Such DMMs include first fit, best fit, segregated fit and buddy systems. Since the performance, memory usage and energy consumption of each DMM differs,…

Neural and Evolutionary Computing · Computer Science 2024-07-16 José L. Risco-Martín , David Atienza , J. Manuel Colmenar , Oscar Garnica

On simulation of continuous determinantal point processes

We review how to simulate continuous determinantal point processes (DPPs) and improve the current simulation algorithms in several important special cases as well as detail how certain types of conditional simulation can be carried out.…

Methodology · Statistics 2023-08-23 Frédéric Lavancier , Ege Rubak

Speculative Parallel Evaluation Of Classification Trees On GPGPU Compute Engines

We examine the problem of optimizing classification tree evaluation for on-line and real-time applications by using GPUs. Looking at trees with continuous attributes often used in image segmentation, we first put the existing algorithms for…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-11-08 Jason Spencer

Exploiting Tournament Selection for Efficient Parallel Genetic Programming

Genetic Programming (GP) is a computationally intensive technique which is naturally parallel in nature. Consequently, many attempts have been made to improve its run-time from exploiting highly parallel hardware such as GPUs. However, a…

Neural and Evolutionary Computing · Computer Science 2018-09-21 Darren M. Chitty

D-PDLP: Scaling PDLP to Distributed Multi-GPU Systems

We present a distributed framework of the Primal-Dual Hybrid Gradient (PDHG) algorithm for solving massive-scale linear programming (LP) problems. Although PDHG-based solvers demonstrate strong performance on single-node GPU architectures,…

Optimization and Control · Mathematics 2026-05-11 Hongpei Li , Yicheng Huang , Huikang Liu , Dongdong Ge , Yinyu Ye

Fixed-point algorithms for learning determinantal point processes

Determinantal point processes (DPPs) offer an elegant tool for encoding probabilities over subsets of a ground set. Discrete DPPs are parametrized by a positive semidefinite matrix (called the DPP kernel), and estimating this kernel is key…

Machine Learning · Computer Science 2015-10-12 Zelda Mariet , Suvrit Sra

Optimized Speculative Sampling for GPU Hardware Accelerators

In this work, we optimize speculative sampling for parallel hardware accelerators to improve sampling speed. We notice that substantial portions of the intermediate matrices necessary for speculative sampling can be computed concurrently.…

Machine Learning · Computer Science 2024-10-04 Dominik Wagner , Seanie Lee , Ilja Baumann , Philipp Seeberger , Korbinian Riedhammer , Tobias Bocklet

Fast Parallel Algorithms for Statistical Subset Selection Problems

In this paper, we propose a new framework for designing fast parallel algorithms for fundamental statistical subset selection tasks that include feature selection and experimental design. Such tasks are known to be weakly submodular and are…

Machine Learning · Computer Science 2021-04-02 Sharon Qian , Yaron Singer

FastDOG: Fast Discrete Optimization on GPU

We present a massively parallel Lagrange decomposition method for solving 0--1 integer linear programs occurring in structured prediction. We propose a new iterative update scheme for solving the Lagrangean dual and a perturbation technique…

Optimization and Control · Mathematics 2022-04-20 Ahmed Abbas , Paul Swoboda