Related papers: Accelerating Low-Rank Factorization-Based Semidefi…

cuHALLaR: A GPU Accelerated Low-Rank Augmented Lagrangian Method for Large-Scale Semidefinite Programming

This paper introduces cuHALLaR, a GPU-accelerated implementation of the HALLaR method proposed in Monteiro et al. 2024 for solving large-scale semidefinite programming (SDP) problems. We demonstrate how our Julia-based implementation…

Optimization and Control · Mathematics 2025-10-27 Jacob M. Aguirre , Diego Cifuentes , Vincent Guigues , Renato D. C. Monteiro , Victor Hugo Nascimento , Arnesh Sujanani

A Low-Rank ADMM Splitting Approach for Semidefinite Programming

We introduce a new first-order method for solving general semidefinite programming problems, based on the alternating direction method of multipliers (ADMM) and a matrix-splitting technique. Our algorithm has an advantage over the…

Optimization and Control · Mathematics 2024-07-30 Qiushi Han , Chenxi Li , Zhenwei Lin , Caihua Chen , Qi Deng , Dongdong Ge , Huikang Liu , Yinyu Ye

GPU accelerated matrix factorization of large scale data using block based approach

Matrix Factorization (MF) on large scale data takes substantial time on a Central Processing Unit (CPU). While Graphical Processing Unit (GPU)s could expedite the computation of MF, the available memory on a GPU is finite. Leveraging GPUs…

Machine Learning · Computer Science 2023-04-28 Prasad Bhavana , Vineet Padmanabhan

Solving Low-Rank Semidefinite Programs via Manifold Optimization

We propose a manifold optimization approach to solve linear semidefinite programs (SDP) with low-rank solutions, with an emphasis on SDP relaxations for polynomial optimization problems. This approach incorporates the inexact augmented…

Optimization and Control · Mathematics 2025-04-30 Jie Wang , Liangbing Hu

Iterative Methods in GPU-Resident Linear Solvers for Nonlinear Constrained Optimization

Linear solvers are major computational bottlenecks in a wide range of decision support and optimization computations. The challenges become even more pronounced on heterogeneous hardware, where traditional sparse numerical linear algebra…

Computational Engineering, Finance, and Science · Computer Science 2024-01-26 Kasia Świrydowicz , Nicholson Koukpaizan , Maksudul Alam , Shaked Regev , Michael Saunders , Slaven Peleš

FastDOG: Fast Discrete Optimization on GPU

We present a massively parallel Lagrange decomposition method for solving 0--1 integer linear programs occurring in structured prediction. We propose a new iterative update scheme for solving the Lagrangean dual and a perturbation technique…

Optimization and Control · Mathematics 2022-04-20 Ahmed Abbas , Paul Swoboda

Suboptimality bounds for trace-bounded SDPs enable a faster and scalable low-rank SDP solver SDPLR+

Semidefinite programs (SDPs) and their solvers are powerful tools with many applications in machine learning and data science. Designing scalable SDP solvers is challenging because by standard the positive semidefinite decision variable is…

Optimization and Control · Mathematics 2024-08-09 Yufan Huang , David F. Gleich

Solving Linear Systems on a GPU with Hierarchically Off-Diagonal Low-Rank Approximations

We are interested in solving linear systems arising from three applications: (1) kernel methods in machine learning, (2) discretization of boundary integral equations from mathematical physics, and (3) Schur complements formed in the…

Numerical Analysis · Mathematics 2022-08-15 Chao Chen , Per-Gunnar Martinsson

Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs

Matrix factorization (MF) is employed by many popular algorithms, e.g., collaborative filtering. The emerging GPU technology, with massively multicore and high intra-chip memory bandwidth but limited memory capacity, presents an opportunity…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-25 Wei Tan , Liangliang Cao , Liana Fong

Exploiting Low-Rank Structure in Semidefinite Programming by Approximate Operator Splitting

In contrast with many other convex optimization classes, state-of-the-art semidefinite programming solvers are yet unable to efficiently solve large scale instances. This work aims to reduce this scalability gap by proposing a novel…

Optimization and Control · Mathematics 2018-12-20 Mario Souto , Joaquim D. Garcia , Alvaro Veiga

Breaking the Blocks: Continuous Low-Rank Decomposed Scaling for Unified LLM Quantization and Adaptation

Current quantization methods for LLMs predominantly rely on block-wise structures to maintain efficiency, often at the cost of representational flexibility. In this work, we demonstrate that element-wise quantization can be made as…

Machine Learning · Computer Science 2026-02-02 Pingzhi Tang , Ruijie Zhou , Fanxu Meng , Wenjie Pei , Muhan Zhang

GPU-Accelerated Cholesky Factorization of Block Tridiagonal Matrices

This paper presents a GPU-accelerated framework for solving block tridiagonal linear systems that arise naturally in numerous real-time applications across engineering and scientific computing. Through a multi-stage permutation strategy…

Optimization and Control · Mathematics 2026-01-08 Roland Schwan , Daniel Kuhn , Colin N. Jones

GPU-accelerated factorization sets in numerical semigroups via parallel bounded lexicographic streams

We describe a method for parallelizing the lexicographic enumeration algorithm for the factorization set of an element in a numerical semigroup via bounds. This enables the use of GPU and distributed computing methods. We provide a CUDA…

Commutative Algebra · Mathematics 2024-05-14 Thomas Barron

Efficient algorithms for computing rank-revealing factorizations on a GPU

Standard rank-revealing factorizations such as the singular value decomposition and column pivoted QR factorization are challenging to implement efficiently on a GPU. A major difficulty in this regard is the inability of standard algorithms…

Numerical Analysis · Mathematics 2023-05-23 Nathan Heavner , Chao Chen , Abinand Gopal , Per-Gunnar Martinsson

Low-rank Momentum Factorization for Memory Efficient Training

Fine-tuning large foundation models presents significant memory challenges due to stateful optimizers like AdamW, often requiring several times more GPU memory than inference. While memory-efficient methods like parameter-efficient…

Machine Learning · Computer Science 2025-07-14 Pouria Mahdavinia , Mehrdad Mahdavi

Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form

Semidefinite programs (SDP) are important in learning and combinatorial optimization with numerous applications. In pursuit of low-rank solutions and low complexity algorithms, we consider the Burer--Monteiro factorization approach for…

Machine Learning · Statistics 2018-03-02 Srinadh Bhojanapalli , Nicolas Boumal , Prateek Jain , Praneeth Netrapalli

Efficient GPU-Centered Singular Value Decomposition Using the Divide-and-Conquer Method

Singular Value Decomposition (SVD) is a fundamental matrix factorization technique in linear algebra, widely applied in numerous matrix-related problems. However, traditional SVD approaches are hindered by slow panel factorization and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-18 Shifang Liu , Huiyuan Li , Hongjiao Sheng , Haoyuan Gui , Xiaoyu Zhang

Semidefinite Programming by Projective Cutting Planes

Seeking tighter relaxations of combinatorial optimization problems, semidefinite programming is a generalization of linear programming that offers better bounds and is still polynomially solvable. Yet, in practice, a semidefinite program is…

Optimization and Control · Mathematics 2023-11-17 Daniel Porumbel

CuMF_SGD: Fast and Scalable Matrix Factorization

Matrix factorization (MF) has been widely used in e.g., recommender systems, topic modeling and word embedding. Stochastic gradient descent (SGD) is popular in solving MF problems because it can deal with large data sets and is easy to do…

Machine Learning · Computer Science 2016-11-11 Xiaolong Xie , Wei Tan , Liana L. Fong , Yun Liang

Efficient Matrix Factorization on Heterogeneous CPU-GPU Systems

Matrix Factorization (MF) has been widely applied in machine learning and data mining. A large number of algorithms have been studied to factorize matrices. Among them, stochastic gradient descent (SGD) is a commonly used method.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-30 Yuanhang Yu , Dong Wen , Ying Zhang , Xiaoyang Wang , Wenjie Zhang , Xuemin Lin