Related papers: Distributed Parallel Structure-Aware Presolving fo…

A Massively Parallel Interior-Point Method for Arrowhead Linear Programs with Local Linking Structure

In practice, non-specialized interior point algorithms often cannot utilize the massively parallel compute resources offered by modern many- and multi-core compute platforms. However, efficient distributed solution techniques are required,…

Optimization and Control · Mathematics 2026-04-10 Nils-Christian Kempke , Daniel Rehfeldt , Thorsten Koch

First Experiments with Structure-Aware Presolving for a Parallel Interior-Point Method

In linear optimization, matrix structure can often be exploited algorithmically. However, beneficial presolving reductions sometimes destroy the special structure of a given problem. In this article, we discuss structure-aware…

Optimization and Control · Mathematics 2019-08-05 Ambros Gleixner , Nils-Christian Kempke , Thorsten Koch , Daniel Rehfeldt , Svenja Uslu

PaPILO: A Parallel Presolving Library for Integer and Linear Programming with Multiprecision Support

Presolving has become an essential component of modern MIP solvers both in terms of computational performance and numerical robustness. In this paper, we present PaPILO, a new C++ header-only library that provides a large set of presolving…

Optimization and Control · Mathematics 2024-03-21 Ambros Gleixner , Leona Gottwald , Alexander Hoen

Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach

As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. While various parallelism strategies-data, model, sequence, and…

Machine Learning · Computer Science 2025-03-13 Ruifeng She , Bowen Pang , Kai Li , Zehua Liu , Tao Zhong

A distributed-memory hierarchical solver for general sparse linear systems

We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it…

Numerical Analysis · Mathematics 2017-12-21 Chao Chen , Hadi Pouransari , Sivasankaran Rajamanickam , Erik G. Boman , Eric Darve

Presolving for GPU-Accelerated First-Order LP Solvers

Recent research has focused on developing GPU-accelerated first-order solvers for linear programming (LP). This line of work, however, has largely overlooked the role of presolving, and thus prior results do not fully reflect the speedups…

Optimization and Control · Mathematics 2026-04-28 Daniel Cederberg , Stephen Boyd

A domain decomposing parallel sparse linear system solver

The solution of large sparse linear systems is often the most time-consuming part of many science and engineering applications. Computational fluid dynamics, circuit simulation, power network analysis, and material science are just a few…

Numerical Analysis · Computer Science 2011-09-20 Murat Manguoglu

Scalable linear solvers for sparse linear systems from large-scale numerical simulations

This paper presents our work on designing scalable linear solvers for large-scale reservoir simulations. The main objective is to support implementation of parallel reservoir simulators on distributed-memory parallel systems, where MPI…

Mathematical Software · Computer Science 2017-01-24 Hui Liu , Zhangxin Chen

Development of A Scalable Platform for Large-scale Reservoir Simulations on Parallel computers

This paper presents our work on designing a parallel platform for large-scale reservoir simulations. Detailed components, such as grid and linear solver, and data structures are introduced, which can serve as a guide to parallel reservoir…

Computational Engineering, Finance, and Science · Computer Science 2018-09-05 Hui Liu , Kun Wang , Bo Yang , Zhangxin Chen

Differentiable Initialization-Accelerated CPU-GPU Hybrid Combinatorial Scheduling

This paper presents a hybrid CPU-GPU framework for solving combinatorial scheduling problems formulated as Integer Linear Programming (ILP). While scheduling underpins many optimization tasks in computing systems, solving these problems…

Machine Learning · Computer Science 2026-04-01 Mingju Liu , Jiaqi Yin , Alvaro Velasquez , Cunxi Yu

parGeMSLR: A Parallel Multilevel Schur Complement Low-Rank Preconditioning and Solution Package for General Sparse Matrices

This paper discusses parGeMSLR, a C++/MPI software library for the solution of sparse systems of linear algebraic equations via preconditioned Krylov subspace methods in distributed-memory computing environments. The preconditioner…

Mathematical Software · Computer Science 2022-05-09 Tianshi Xu , Vassilis Kalantzis , Ruipeng Li , Yuanzhe Xi , Geoffrey Dillon , Yousef Saad

Distributed Hybrid Parallelism for Large Language Models: Comparative Study and System Design Guide

With the rapid growth of large language models (LLMs), a wide range of methods have been developed to distribute computation and memory across hardware devices for efficient training and inference. While existing surveys provide descriptive…

Machine Learning · Computer Science 2026-02-11 Hossam Amer , Rezaul Karim , Ali Pourranjbar , Weiwei Zhang , Walid Ahmed , Boxing Chen

SparsePipe: Parallel Deep Learning for 3D Point Clouds

We propose SparsePipe, an efficient and asynchronous parallelism approach for handling 3D point clouds with multi-GPU training. SparsePipe is built to support 3D sparse data such as point clouds. It achieves this by adopting generalized…

Computer Vision and Pattern Recognition · Computer Science 2020-12-29 Keke Zhai , Pan He , Tania Banerjee , Anand Rangarajan , Sanjay Ranka

Effective GPU Parallelization of Distributed and Localized Model Predictive Control

To effectively control large-scale distributed systems online, model predictive control (MPC) has to swiftly solve the underlying high-dimensional optimization. There are multiple techniques applied to accelerate the solving process in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-30 Carmen Amo Alonso , Shih-Hao Tseng

RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference

RAPID-LLM is a unified performance modeling framework for large language model (LLM) training and inference on GPU clusters. It couples a DeepFlow-based frontend that generates hardware-aware, operator-level Chakra execution traces from an…

Performance · Computer Science 2025-12-23 George Karfakis , Faraz Tahmasebi , Binglu Chen , Lime Yao , Saptarshi Mitra , Tianyue Pan , Hyoukjun Kwon , Puneet Gupta

Interferences within a certifiable design methodology for high-performance multi-core platforms

The adoption of high-performance multi-core platforms in avionics and automotive systems introduces significant challenges in ensuring predictable execution, primarily due to shared resource interferences. Many existing approaches study…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-14 Mohamed Amine Khelassi , Felix Suchert , Abderaouf Amalou , Benjamin Lesage , Anika Christmann , Robin Hapka , Jeronimo Castrillon , Mihail Asavoae , Mathieu Jan , Claire Pagetti , Selma Saidi

Distributed Semi-Speculative Parallel Anisotropic Mesh Adaptation

This paper presents a distributed memory method for anisotropic mesh adaptation that is designed to avoid the use of collective communication and global synchronization techniques. In the presented method, meshing functionality is separated…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-18 Kevin Garner , Polykarpos Thomadakis , Nikos Chrisochoides

ASPD: Unlocking Adaptive Serial-Parallel Decoding by Exploring Intrinsic Parallelism in LLMs

The increasing scale and complexity of large language models (LLMs) pose significant inference latency challenges, primarily due to their autoregressive decoding paradigm characterized by the sequential nature of next-token prediction. By…

Computation and Language · Computer Science 2025-08-15 Keyu Chen , Zhifeng Shen , Daohai Yu , Haoqian Wu , Wei Wen , Jianfeng He , Ruizhi Qiao , Xing Sun

Embarrassingly Parallel Independent Training of Multi-Layer Perceptrons with Heterogeneous Architectures

The definition of a Neural Network architecture is one of the most critical and challenging tasks to perform. In this paper, we propose ParallelMLPs. ParallelMLPs is a procedure to enable the training of several independent Multilayer…

Machine Learning · Computer Science 2022-06-20 Felipe Costa Farias , Teresa Bernarda Ludermir , Carmelo Jose Albanez Bastos-Filho

Scaling Structured Multigrid to 500K+ Cores through Coarse-Grid Redistribution

The efficient solution of sparse, linear systems resulting from the discretization of partial differential equations is crucial to the performance of many physics-based simulations. The algorithmic optimality of multilevel approaches for…

Mathematical Software · Computer Science 2018-03-08 Andrew Reisner , Luke N. Olson , J. David Moulton