Related papers: Boolean Matrix Logic Programming on the GPU

Simulating Petri nets with Boolean Matrix Logic Programming

Recent attention to relational knowledge bases has sparked a demand for understanding how relations change between entities. Petri nets can represent knowledge structure and dynamically simulate interactions between entities, and thus they…

Artificial Intelligence · Computer Science 2024-05-21 Lun Ai , Stephen H. Muggleton , Shi-Shun Liang , Geoff S. Baldwin

Algorithms and Hardware for Efficient Processing of Logic-based Neural Networks

Recent efforts to improve the performance of neural network (NN) accelerators that meet today's application requirements have given rise to a new trend of logic-based NN inference relying on fixed-function combinational logic (FFCL). This…

Hardware Architecture · Computer Science 2023-04-14 Jingkai Hong , Arash Fayyazi , Amirhossein Esmaili , Mahdi Nazemi , Massoud Pedram

Accelerating Bidiagonalization of Banded Matrices through Memory-Aware Bulge-Chasing on GPUs

The reduction of a banded matrix to bidiagonal form is a critical step in the calculation of Singular Values, a cornerstone of scientific computing and AI. Although inherently parallel, this step has traditionally been considered unsuitable…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-14 Evelyne Ringoot , Rabab Alomairy , Alan Edelman

GPU-Accelerated Primal Heuristics for Mixed Integer Programming

We introduce a fusion of GPU accelerated primal heuristics for Mixed Integer Programming. Leveraging GPU acceleration enables exploration of larger search regions and faster iterations. A GPU-accelerated PDLP serves as an approximate LP…

Optimization and Control · Mathematics 2025-10-31 Akif Çördük , Piotr Sielski , Alice Boucher , Kumar Aatish

Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models

Reasoning about hypotheses and updating knowledge through empirical observations are central to scientific discovery. In this work, we applied logic-based machine learning methods to drive biological discovery by guiding experimentation.…

Molecular Networks · Quantitative Biology 2025-06-09 Lun Ai , Stephen H. Muggleton , Shi-Shun Liang , Geoff S. Baldwin

Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference

Large language models have been widely adopted across different tasks, but their auto-regressive generation nature often leads to inefficient resource utilization during inference. While batching is commonly used to increase throughput,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-14 Pol G. Recasens , Ferran Agullo , Yue Zhu , Chen Wang , Eun Kyung Lee , Olivier Tardieu , Jordi Torres , Josep Ll. Berral

Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

GPU-Accelerated Loopy Belief Propagation for Program Analysis

Loopy Belief Propagation (LBP) is a widely used approximate inference algorithm in probabilistic graphical models, with applications in computer vision, error correction codes, protein folding, program analysis, etc. However, LBP faces…

Software Engineering · Computer Science 2025-09-29 Haoyu Feng , Xin Zhang

Lobster: A GPU-Accelerated Framework for Neurosymbolic Programming

Neurosymbolic programs combine deep learning with symbolic reasoning to achieve better data efficiency, interpretability, and generalizability compared to standalone deep learning approaches. However, existing neurosymbolic learning…

Programming Languages · Computer Science 2025-10-01 Paul Biberstein , Ziyang Li , Joseph Devietti , Mayur Naik

LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

The rapid development of large language models (LLM) has greatly enhanced everyday applications. While many FPGA-based accelerators, with flexibility for fine-grained data control, exhibit superior speed and energy efficiency compared to…

Hardware Architecture · Computer Science 2026-03-24 Zifan He , Shengyu Ye , Rui Ma , Yang Wang , Jason Cong

Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores

Large language models (LLMs) have been widely applied but face challenges in efficient inference. While quantization methods reduce computational demands, ultra-low bit quantization with arbitrary precision is hindered by limited GPU Tensor…

Machine Learning · Computer Science 2025-03-14 Shaobo Ma , Chao Fang , Haikuo Shao , Zhongfeng Wang

Optimizing Datalog for the GPU

Modern Datalog engines (e.g., LogicBlox, Souffl\'e, ddlog) enable their users to write declarative queries which compute recursive deductions over extensional facts, leaving high-performance operationalization (query planning, semi-na\"ive…

Databases · Computer Science 2024-11-20 Yihao Sun , Ahmedur Rahman Shovon , Thomas Gilray , Kristopher Micinski , Sidharth Kumar

An Overview of GPU-based First-Order Methods for Linear Programming and Extensions

The rapid progress in GPU computing has revolutionized many fields, yet its potential in mathematical programming, such as linear programming (LP), has only recently begun to be realized. This survey aims to provide a comprehensive overview…

Optimization and Control · Mathematics 2025-06-04 Haihao Lu , Jinwen Yang

Active learning of digenic functions with boolean matrix logic programming

We apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery, based on comprehensive databases of metabolic processes called genome-scale metabolic network models (GEMs). Predicted host…

Artificial Intelligence · Computer Science 2024-11-14 Lun Ai , Stephen H. Muggleton , Shi-shun Liang , Geoff S. Baldwin

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Large language models have been widely adopted but require significant GPU memory for inference. We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory…

Machine Learning · Computer Science 2022-11-11 Tim Dettmers , Mike Lewis , Younes Belkada , Luke Zettlemoyer

Accelerating a Linear Programming Algorithm on AMD GPUs

Linear Programming (LP) is a foundational optimization technique with widespread applications in finance, energy trading, and supply chain logistics. However, traditional Central Processing Unit (CPU)-based LP solvers often struggle to meet…

Optimization and Control · Mathematics 2025-08-26 Xiyan Hu , Titus Parker , Connor Phillips , Yifa Yu

Cross-platform programming model for many-core lattice Boltzmann simulations

We present a novel, hardware-agnostic implementation strategy for lattice Boltzmann (LB) simulations, which yields massive performance on homogeneous and heterogeneous many-core platforms. Based solely on C++17 Parallel Algorithms, our…

Computational Physics · Physics 2021-05-11 Jonas Latt , Christophe Coreixas , Joël Beny

Understanding Inference Scaling for LLMs: Bottlenecks, Trade-offs, and Performance Principles

The transition from standard generative AI to \emph{reasoning-centric architectures}, exemplified by models capable of extensive Chain-of-Thought~(CoT) processing, marks a fundamental paradigm shift in system requirements. Unlike…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-20 Moiz Arif , Avinash Maurya , Sudharshan Vazhkudai , Bogdan Nicolae

Faster LLM Inference using DBMS-Inspired Preemption and Cache Replacement Policies

LLMs are increasingly used world-wide from daily tasks to agentic systems and data analytics, requiring significant GPU resources. LLM inference systems, however, are slow compared to database systems, and inference performance and…

Performance · Computer Science 2025-10-03 Kyoungmin Kim , Jiacheng Li , Kijae Hong , Anastasia Ailamaki

DuaLip-GPU Technical Report

Large-scale linear programs (LPs) arise in many decision systems, including ranking, allocation, and matching problems that must be solved repeatedly at massive scale. Prior work such as ECLIPSE and LinkedIn's open-source DuaLip showed that…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-06 Gregory Dexter , Aida Rahmattalabi , Sanjana Garg , Qinquan Song , Ruby Tu , Yuan Gao , Yi Zhang , Zhipeng Wang , Rahul Mazumder