Related papers: Message Passing Descent for Efficient Machine Lear…
Stein variational gradient descent (SVGD) is a recently proposed particle-based Bayesian inference method, which has attracted a lot of interest due to its remarkable approximation ability and particle efficiency compared to traditional…
Message passing neural networks (MPNNs) have emerged as go-to models for learning on graph-structured data in the past decade. Despite their effectiveness, most of such models still incur severe issues such as over-smoothing and…
Message-passing algorithms based on the Belief Propagation (BP) equations constitute a well-known distributed computational scheme. It is exact on tree-like graphical models and has also proven to be effective in many problems defined on…
We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by…
Federated learning (FL) provides a communication-efficient approach to solve machine learning problems concerning distributed data, without sending raw data to a central server. However, existing works on FL only utilize first-order…
Momentum Stochastic Gradient Descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning, e.g., training deep neural networks, variational Bayesian inference, and etc. Despite its empirical…
Parametric message passing (MP) is a promising technique that provides reliable marginal probability distributions for distributed cooperative positioning (DCP) based on factor graphs (FG), while maintaining minimal computational…
When applied to large-scale learning problems, the conventional wisdom on privacy-preserving deep learning, known as Differential Private Stochastic Gradient Descent (DP-SGD), has met with limited success due to significant performance…
Federated Learning (FL) represents a growing machine learning (ML) paradigm designed for training models across numerous nodes that retain local datasets, all without directly exchanging the underlying private data with the parameter server…
Deep neural networks (DNN) are typically optimized using stochastic gradient descent (SGD). However, the estimation of the gradient using stochastic samples tends to be noisy and unreliable, resulting in large gradient variance and bad…
Stochastic Gradient Descent (SGD) and its momentum variants form the backbone of deep learning optimization, yet the underlying dynamics of their gradient behavior remain insufficiently understood. In this work, we reinterpret gradient…
Parameter-Efficient Fine-Tuning (PEFT) has emerged as a key strategy for adapting large-scale pre-trained models to downstream tasks, but existing approaches face notable limitations. Addition-based methods, such as Adapters, introduce…
Efficient learning and model compression algorithm for deep neural network (DNN) is a key workhorse behind the rise of deep learning (DL). In this work, we propose a message passing based Bayesian deep learning algorithm called EM-TDAMP to…
We introduce a new type of graphical model that we call a "memory factor network" (MFN). We show how to use MFNs to model the structure inherent in many types of data sets. We also introduce an associated message-passing style algorithm…
Label imbalance and homophily-heterophily mixture are the fundamental problems encountered when applying Graph Neural Networks (GNNs) to Graph Fraud Detection (GFD) tasks. Existing GNN-based GFD models are designed to augment graph…
Graphical models use the intuitive and well-studied methods of graph theory to implicitly represent dependencies between variables in large systems. They can model the global behaviour of a complex system by specifying only local factors.…
In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep…
We study a fully decentralized federated learning algorithm, which is a novel gradient descent algorithm executed on a communication-based network. For convenience, we refer to it as a network gradient descent (NGD) method. In the NGD…
Matrix-parametrized models, including multiclass logistic regression and sparse coding, are used in machine learning (ML) applications ranging from computer vision to computational biology. When these models are applied to large-scale ML…
Matrix-parametrized models, including multiclass logistic regression and sparse coding, are used in machine learning (ML) applications ranging from computer vision to computational biology. When these models are applied to large-scale ML…