Related papers: Efficient GPU Implementation for Single Block Orth…

Block-Diagonal Sparse Representation by Learning a Linear Combination Dictionary for Recognition

In a sparse representation based recognition scheme, it is critical to learn a desired dictionary, aiming both good representational power and discriminative performance. In this paper, we propose a new dictionary learning model for…

Computer Vision and Pattern Recognition · Computer Science 2016-11-29 Xinglin Piao , Yongli Hu , Yanfeng Sun , Junbin Gao , Baocai Yin

A Block-Sparse Bayesian Learning Algorithm with Dictionary Parameter Estimation for Multi-Sensor Data Fusion

We propose an sparse Bayesian learning (SBL)-based method that leverages group sparsity and multiple parameterized dictionaries to detect the relevant dictionary entries and estimate their continuous parameters by combining data from…

Signal Processing · Electrical Eng. & Systems 2025-11-05 Jakob Möderl , Anders Malte Westerkam , Alexander Venus , Erik Leitinger

Efficient Memory Management for GPU-based Deep Learning Systems

GPU (graphics processing unit) has been used for many data-intensive applications. Among them, deep learning systems are one of the most important consumer systems for GPU nowadays. As deep learning applications impose deeper and larger…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-18 Junzhe Zhang , Sai Ho Yeung , Yao Shu , Bingsheng He , Wei Wang

Exact Sparse Orthogonal Dictionary Learning

Over the past decade, learning a dictionary from input images for sparse modeling has been one of the topics which receive most research attention in image processing and compressed sensing. Most existing dictionary learning methods…

Image and Video Processing · Electrical Eng. & Systems 2021-04-27 Kai Liu , Yongjian Zhao , Hua Wang

Parallel GPU-Enabled Algorithms for SpGEMM on Arbitrary Semirings with Hybrid Communication

Sparse General Matrix Multiply (SpGEMM) is key for various High-Performance Computing (HPC) applications such as genomics and graph analytics. Using the semiring abstraction, many algorithms can be formulated as SpGEMM, allowing…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-23 Thomas McFarland , Julian Bellavita , Giulia Guidi

Sgap: Towards Efficient Sparse Tensor Algebra Compilation for GPU

Sparse compiler is a promising solution for sparse tensor algebra optimization. In compiler implementation, reduction in sparse-dense hybrid algebra plays a key role in performance. Though GPU provides various reduction semantics that can…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-10 Genghan Zhang , Yuetong Zhao , Yanting Tao , Zhongming Yu , Guohao Dai , Sitao Huang , Yuan Wen , Pavlos Petoumenos , Yu Wang

Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers

Motivated by extreme multi-label classification applications, we consider training deep learning models over sparse data in multi-GPU servers. The variance in the number of non-zero features across training batches and the intrinsic GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-15 Yujing Ma , Florin Rusu , Kesheng Wu , Alexander Sim

A high-performance and portable implementation of the SISSO method for CPUs and GPUs

SISSO (sure-independence screening and sparsifying operator) is an artificial intelligence (AI) method based on symbolic regression and compressed sensing widely used in materials science research. SISSO++ is its C++ implementation that…

Performance · Computer Science 2025-02-28 Sebastian Eibl , Yi Yao , Matthias Scheffler , Markus Rampp , Luca M. Ghiringhelli , Thomas A. R. Purcell

An OpenCL implementation for the solution of TDSE on GPU and CPU architectures

Open Computing Language (OpenCL) is a parallel processing language that is ideally suited for running parallel algorithms on Graphical Processing Units (GPUs). In the present work we report on the development of a generic parallel…

Computational Physics · Physics 2012-05-31 Cathal Ó Broin , L. A. A. Nikolopoulos

Memory-Efficient Object-Oriented Programming on GPUs

Object-oriented programming is often regarded as too inefficient for high-performance computing (HPC), despite the fact that many important HPC problems have an inherent object structure. Our goal is to bring efficient, object-oriented…

Programming Languages · Computer Science 2019-08-19 Matthias Springer

A GPU-accelerated Nonlinear Branch-and-Bound Framework for Sparse Linear Models

We study exact sparse linear regression with an $\ell_0-\ell_2$ penalty and develop a branch-and-bound (BnB) algorithm explicitly designed for GPU execution. Starting from a perspective reformulation, we derive an interval relaxation that…

Optimization and Control · Mathematics 2026-02-05 Xiang Meng , Ryan Lucas , Rahul Mazumder

A Decomposition Framework for Certifiably Optimal Orthogonal Sparse PCA

Sparse Principal Component Analysis (SPCA) is an important technique for high-dimensional data analysis, improving interpretability by imposing sparsity on principal components. However, existing methods often fail to simultaneously…

Machine Learning · Computer Science 2026-03-03 Difei Cheng , Qiao Hu

Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures

Designing efficient and scalable sparse linear algebra kernels on modern multi-GPU based HPC systems is a daunting task due to significant irregular memory references and workload imbalance across the GPUs. This is particularly the case for…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-15 Chenhao Xie , Jieyang Chen , Jesun S Firoz , Jiajia Li , Shuaiwen Leon Song , Kevin Barker , Mark Raugas , Ang Li

Direct Low-Dose CT Image Reconstruction on GPU using Out-Of-Core: Precision and Quality Study

Algebraic methods applied to the reconstruction of Sparse-view Computed Tomography (CT) can provide both a high image quality and a decrease in the dose received by patients, although with an increased reconstruction time since their…

Medical Physics · Physics 2024-12-12 M. Chillarón , G. Quintana-Ortí , V. Vidal , G. Verdú

Global Optimization on Graph-Structured Data via Gaussian Processes with Spectral Representations

Bayesian optimization (BO) is a powerful framework for optimizing expensive black-box objectives, yet extending it to graph-structured domains remains challenging due to the discrete and combinatorial nature of graphs. Existing approaches…

Machine Learning · Computer Science 2025-11-12 Shu Hong , Yongsheng Mei , Mahdi Imani , Tian Lan

Hardware Counted Profile-Guided Optimization

Profile-Guided Optimization (PGO) is an excellent means to improve the performance of a compiled program. Indeed, the execution path data it provides helps the compiler to generate better code and better cacheline packing. At the time of…

Programming Languages · Computer Science 2014-11-25 Baptiste Wicht , Roberto A. Vitillo , Dehao Chen , David Levinthal

Dictionary Optimization for Block-Sparse Representations

Recent work has demonstrated that using a carefully designed dictionary instead of a predefined one, can improve the sparsity in jointly representing a class of signals. This has motivated the derivation of learning methods for designing a…

Information Theory · Computer Science 2010-05-04 Kevin Rosenblum , Lihi Zelnik-Manor , Yonina C. Eldar

A new GPU implementation for lattice-Boltzmann simulations on sparse geometries

We describe a high-performance implementation of the lattice Boltzmann method (LBM) for sparse 3D geometries on graphic processors (GPU). The main contribution of this work is a data layout that allows to minimise the number of redundant…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-10 Tadeusz Tomczak , Roman G. Szafran

Large-Scale Stochastic Learning using GPUs

In this work we propose an accelerated stochastic learning system for very large-scale applications. Acceleration is achieved by mapping the training algorithm onto massively parallel processors: we demonstrate a parallel, asynchronous GPU…

Machine Learning · Computer Science 2017-02-24 Thomas Parnell , Celestine Dünner , Kubilay Atasu , Manolis Sifalakis , Haris Pozidis

Harnessing Batched BLAS/LAPACK Kernels on GPUs for Parallel Solutions of Block Tridiagonal Systems

Block-tridiagonal systems are prevalent in state estimation and optimal control, and solving these systems is often the computational bottleneck. Improving the underlying solvers therefore has a direct impact on the real-time performance of…

Mathematical Software · Computer Science 2025-12-05 David Jin , Alexis Montoison , Sungho Shin