Related papers: Ilargi: a GPU Compatible Factorized ML Model Train…

Matrix Factorization on GPUs with Memory Optimization and Approximate Computing

Matrix factorization (MF) discovers latent features from observations, which has shown great promises in the fields of collaborative filtering, data compression, feature extraction, word embedding, etc. While many problem-specific…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-14 Wei Tan , Shiyu Chang , Liana Fong , Cheng Li , Zijun Wang , Liangliang Cao

BMF: Block matrix approach to factorization of large scale data

Matrix Factorization (MF) on large scale matrices is computationally as well as memory intensive task. Alternative convergence techniques are needed when the size of the input matrix is higher than the available memory on a Central…

Machine Learning · Computer Science 2019-01-21 Prasad G Bhavana , Vineet C Nair

GPU accelerated matrix factorization of large scale data using block based approach

Matrix Factorization (MF) on large scale data takes substantial time on a Central Processing Unit (CPU). While Graphical Processing Unit (GPU)s could expedite the computation of MF, the available memory on a GPU is finite. Leveraging GPUs…

Machine Learning · Computer Science 2023-04-28 Prasad Bhavana , Vineet Padmanabhan

CLARGA: Multimodal Graph Representation Learning over Arbitrary Sets of Modalities

We introduce CLARGA, a general-purpose multimodal fusion architecture for multimodal representation learning that works with any number and type of modalities without changing the underlying framework. Given a supervised dataset, CLARGA can…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Santosh Patapati

Efficient Matrix Factorization on Heterogeneous CPU-GPU Systems

Matrix Factorization (MF) has been widely applied in machine learning and data mining. A large number of algorithms have been studied to factorize matrices. Among them, stochastic gradient descent (SGD) is a commonly used method.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-30 Yuanhang Yu , Dong Wen , Ying Zhang , Xiaoyang Wang , Wenjie Zhang , Xuemin Lin

The Energy-Efficient Hierarchical Neural Network with Fast FPGA-Based Incremental Learning

The rising computational and energy demands of deep learning, particularly in large-scale architectures such as foundation models and large language models (LLMs), pose significant challenges to sustainability. Traditional gradient-based…

Machine Learning · Computer Science 2025-09-19 Mohammad Saleh Vahdatpour , Huaiyuan Chu , Yanqing Zhang

Towards Linear Algebra over Normalized Data

Providing machine learning (ML) over relational data is a mainstream requirement for data analytics systems. While almost all the ML tools require the input data to be presented as a single table, many datasets are multi-table, which forces…

Databases · Computer Science 2017-06-28 Lingjiao Chen , Arun Kumar , Jeffrey Naughton , Jignesh M. Patel

Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs

Matrix factorization (MF) is employed by many popular algorithms, e.g., collaborative filtering. The emerging GPU technology, with massively multicore and high intra-chip memory bandwidth but limited memory capacity, presents an opportunity…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-25 Wei Tan , Liangliang Cao , Liana Fong

Optimizer Fusion: Efficient Training with Better Locality and Parallelism

Machine learning frameworks adopt iterative optimizers to train neural networks. Conventional eager execution separates the updating of trainable parameters from forward and backward computations. However, this approach introduces…

Machine Learning · Computer Science 2021-04-02 Zixuan Jiang , Jiaqi Gu , Mingjie Liu , Keren Zhu , David Z. Pan

InferF: Declarative Factorization of AI/ML Inferences over Joins

Real-world AI/ML workflows often apply inference computations to feature vectors joined from multiple datasets. To avoid the redundant AI/ML computations caused by repeated data records in the join's output, factorized ML has been proposed…

Databases · Computer Science 2025-11-26 Kanchan Chowdhury , Lixi Zhou , Lulu Xie , Xinwei Fu , Jia Zou

Learning Rate Scheduling with Matrix Factorization for Private Training

We study differentially private model training with stochastic gradient descent under learning rate scheduling and correlated noise. Although correlated noise, in particular via matrix factorizations, has been shown to improve accuracy,…

Machine Learning · Computer Science 2026-05-12 Nikita P. Kalinin , Joel Daniel Andersson

FELARE: Fair Scheduling of Machine Learning Tasks on Heterogeneous Edge Systems

Edge computing enables smart IoT-based systems via concurrent and continuous execution of latency-sensitive machine learning (ML) applications. These edge-based machine learning systems are often battery-powered (i.e., energy-limited). They…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-22 Ali Mokhtari , Md Abir Hossen , Pooyan Jamshidi , Mohsen Amini Salehi

Frenzy: A Memory-Aware Serverless LLM Training System for Heterogeneous GPU Clusters

Existing work only effective on a given number of GPUs, often neglecting the complexities involved in manually determining the specific types and quantities of GPUs needed, which can be a significant burden for developers. To address this…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-20 Zihan Chang , Sheng Xiao , Shuibing He , Siling Yang , Zhe Pan , Dong Li

Aggregating Capacity in FL through Successive Layer Training for Computationally-Constrained Devices

Federated learning (FL) is usually performed on resource-constrained edge devices, e.g., with limited memory for the computation. If the required memory to train a model exceeds this limit, the device will be excluded from the training.…

Machine Learning · Computer Science 2023-11-28 Kilian Pfeiffer , Ramin Khalili , Jörg Henkel

Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment

Reasoning-based image quality assessment (IQA) models trained through reinforcement learning (RL) exhibit exceptional generalization, yet the underlying mechanisms and critical factors driving this capability remain underexplored in current…

Computer Vision and Pattern Recognition · Computer Science 2026-03-04 Shijie Zhao , Xuanyu Zhang , Weiqi Li , Junlin Li , Li Zhang , Tianfan Xue , Jian Zhang

SplitGP: Achieving Both Generalization and Personalization in Federated Learning

A fundamental challenge to providing edge-AI services is the need for a machine learning (ML) model that achieves personalization (i.e., to individual clients) and generalization (i.e., to unseen data) properties concurrently. Existing…

Machine Learning · Computer Science 2023-02-14 Dong-Jun Han , Do-Yeon Kim , Minseok Choi , Christopher G. Brinton , Jaekyun Moon

FedHC: A Scalable Federated Learning Framework for Heterogeneous and Resource-Constrained Clients

Federated Learning (FL) is a distributed learning paradigm that empowers edge devices to collaboratively learn a global model leveraging local data. Simulating FL on GPU is essential to expedite FL algorithm prototyping and evaluations.…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-26 Min Zhang , Fuxun Yu , Yongbo Yu , Minjia Zhang , Ang Li , Xiang Chen

Accelerating Battery Material Optimization through iterative Machine Learning

The performance of battery materials is determined by their composition and the processing conditions employed during commercial-scale fabrication, where raw materials undergo complex processing steps with various additives to yield final…

Signal Processing · Electrical Eng. & Systems 2025-05-27 Seon-Hwa Lee , Insoo Ye , Changhwan Lee , Jieun Kim , Geunho Choi , Sang-Cheol Nam , Inchul Park

Indirect Learning of Interatomic Potentials for Accelerated Materials Simulations

Machine learning (ML) based interatomic potentials are emerging tools for materials simulations but require a trade-off between accuracy and speed. Here we show how one can use one ML potential model to train another: we use an existing,…

Materials Science · Physics 2022-09-20 Joe D. Morrow , Volker L. Deringer

A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization with Partial Pivoting

We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques target the scenario…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-22 Sandra Catalán , José R. Herrero , Enrique S. Quintana-Ortí , Rafael Rodríguez-Sánchez , Robert van de Geijn