Related papers: Accelerating Machine Learning Queries with Linear …

On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML

Many large-scale machine learning (ML) systems allow specifying custom ML algorithms by means of linear algebra programs, and then automatically generate efficient execution plans. In this context, optimization opportunities for fused…

Databases · Computer Science 2018-01-04 Matthias Boehm , Berthold Reinwald , Dylan Hutchison , Alexandre V. Evfimievski , Prithviraj Sen

Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

Data Fusion with Latent Map Gaussian Processes

Multi-fidelity modeling and calibration are data fusion tasks that ubiquitously arise in engineering design. In this paper, we introduce a novel approach based on latent-map Gaussian processes (LMGPs) that enables efficient and accurate…

Machine Learning · Statistics 2022-01-17 Nicholas Oune , Jonathan Tammer Eweis-Labolle , Ramin Bostanabad

GPU-Accelerated Primal Heuristics for Mixed Integer Programming

We introduce a fusion of GPU accelerated primal heuristics for Mixed Integer Programming. Leveraging GPU acceleration enables exploration of larger search regions and faster iterations. A GPU-accelerated PDLP serves as an approximate LP…

Optimization and Control · Mathematics 2025-10-31 Akif Çördük , Piotr Sielski , Alice Boucher , Kumar Aatish

Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems

We propose a generic algorithmic building block to accelerate training of machine learning models on heterogeneous compute systems. Our scheme allows to efficiently employ compute accelerators such as GPUs and FPGAs for the training of…

Machine Learning · Computer Science 2017-11-08 Celestine Dünner , Thomas Parnell , Martin Jaggi

Efficient Tabular Data Preprocessing of ML Pipelines

Data preprocessing pipelines, which includes data decoding, cleaning, and transforming, are a crucial component of Machine Learning (ML) training. Thy are computationally intensive and often become a major bottleneck, due to the increasing…

Hardware Architecture · Computer Science 2024-09-24 Yu Zhu , Wenqi Jiang , Gustavo Alonso

Benchmarking Edge AI Platforms for High-Performance ML Inference

Edge computing's growing prominence, due to its ability to reduce communication latency and enable real-time processing, is promoting the rise of high-performance, heterogeneous System-on-Chip solutions. While current approaches often…

Artificial Intelligence · Computer Science 2024-09-24 Rakshith Jayanth , Neelesh Gupta , Viktor Prasanna

Akceleracja obliczen algebry liniowej z wykorzystaniem masywnie rownoleglych, wielordzeniowych procesorow GPU

The paper presents the aspect of use of modern graphics accelerators supporting CUDA technology for high-performance computing in the field of linear algebra. Fully programmable graphic cards have been available for several years for both…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-06-27 Lukasz Swierczewski

FusionStitching: Boosting Execution Efficiency of Memory Intensive Computations for DL Workloads

Performance optimization is the art of continuous seeking a harmonious mapping between the application domain and hardware. Recent years have witnessed a surge of deep learning (DL) applications in industry. Conventional wisdom for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-27 Guoping Long , Jun Yang , Wei Lin

A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform Quantization

Matrix multiplication is the bedrock in Deep Learning inference application. When it comes to hardware acceleration on edge computing devices, matrix multiplication often takes up a great majority of the time. To achieve better performance…

Machine Learning · Computer Science 2021-10-12 Yuyang Zhang , Dik Hin Leung , Min Guo , Yijia Xiao , Haoyue Liu , Yunfei Li , Jiyuan Zhang , Guan Wang , Zhen Chen

Automatic Task Parallelization of Dataflow Graphs in ML/DL models

Several methods exist today to accelerate Machine Learning(ML) or Deep-Learning(DL) model performance for training and inference. However, modern techniques that rely on various graph and operator parallelism methodologies rely on search…

Machine Learning · Computer Science 2023-08-23 Srinjoy Das , Lawrence Rauchwerger

A GNN-Guided Predict-and-Search Framework for Mixed-Integer Linear Programming

Mixed-integer linear programming (MILP) is widely employed for modeling combinatorial optimization problems. In practice, similar MILP instances with only coefficient variations are routinely solved, and machine learning (ML) algorithms are…

Optimization and Control · Mathematics 2023-03-07 Qingyu Han , Linxin Yang , Qian Chen , Xiang Zhou , Dong Zhang , Akang Wang , Ruoyu Sun , Xiaodong Luo

A GPU-friendly Geometric Data Model and Algebra for Spatial Queries: Extended Version

The availability of low cost sensors has led to an unprecedented growth in the volume of spatial data. However, the time required to evaluate even simple spatial queries over large data sets greatly hampers our ability to interactively…

Databases · Computer Science 2020-04-09 Harish Doraiswamy , Juliana Freire

Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning (Technical Report)

Learning from the data stored in a database is an important function increasingly available in relational engines. Methods using lower precision input data are of special interest given their overall higher efficiency but, in databases,…

Data Structures and Algorithms · Computer Science 2019-03-29 Zeke Wang , Kaan Kara , Hantian Zhang , Gustavo Alonso , Onur Mutlu , Ce Zhang

Accelerating Sparse Linear Solvers with an Optical Laser Processing Unit

Solving large, sparse linear systems is a fundamental workload in scientific computing and engineering simulations, often dominating runtime and energy consumption in high-performance computing (HPC) applications. In this work, we explore…

Computational Engineering, Finance, and Science · Computer Science 2026-04-30 Dan Gluck , Yotam Mimran , Andrey Karenskih , Talya Vaknin , Omri Wolf , Ruti Ben-Shlomi , Johannes Gebert

Cache-aware Performance Modeling and Prediction for Dense Linear Algebra

Countless applications cast their computational core in terms of dense linear algebra operations. These operations can usually be implemented by combining the routines offered by standard linear algebra libraries such as BLAS and LAPACK,…

Performance · Computer Science 2014-10-01 Elmar Peise , Paolo Bientinesi

Benchmarking the Linear Algebra Awareness of TensorFlow and PyTorch

Linear algebra operations, which are ubiquitous in machine learning, form major performance bottlenecks. The High-Performance Computing community invests significant effort in the development of architecture-specific optimized kernels, such…

Mathematical Software · Computer Science 2022-08-09 Aravind Sankaran , Navid Akbari Alashti , Christos Psarras , Paolo Bientinesi

Multi-task fusion for improving mammography screening data classification

Machine learning and deep learning methods have become essential for computer-assisted prediction in medicine, with a growing number of applications also in the field of mammography. Typically these algorithms are trained for a specific…

Image and Video Processing · Electrical Eng. & Systems 2021-12-03 Maria Wimmer , Gert Sluiter , David Major , Dimitrios Lenis , Astrid Berg , Theresa Neubauer , Katja Bühler

Iterative Methods in GPU-Resident Linear Solvers for Nonlinear Constrained Optimization

Linear solvers are major computational bottlenecks in a wide range of decision support and optimization computations. The challenges become even more pronounced on heterogeneous hardware, where traditional sparse numerical linear algebra…

Computational Engineering, Finance, and Science · Computer Science 2024-01-26 Kasia Świrydowicz , Nicholson Koukpaizan , Maksudul Alam , Shaked Regev , Michael Saunders , Slaven Peleš

From Data to Action: Accelerating Refinery Optimization with AI

Nowadays refinery optimization utilizes sheer amounts of data, which can be handled with modern Linear Programming (LP) software, but the interpreting and applying the results remains challenging. Large petrochemical companies use massive…

Machine Learning · Statistics 2026-05-15 Dániel Pfeifer , Ábrahám Papp , Tibor Bernáth , Tamás Zoltán Varga , Márk Czifra , Botond Szilágyi , Edith Alice Kovács