Related papers: Efficient Matrix Factorization on Heterogeneous CP…
Matrix Factorization (MF) on large scale data takes substantial time on a Central Processing Unit (CPU). While Graphical Processing Unit (GPU)s could expedite the computation of MF, the available memory on a GPU is finite. Leveraging GPUs…
Matrix factorization (MF) has been widely used in e.g., recommender systems, topic modeling and word embedding. Stochastic gradient descent (SGD) is popular in solving MF problems because it can deal with large data sets and is easy to do…
Matrix factorization (MF) is employed by many popular algorithms, e.g., collaborative filtering. The emerging GPU technology, with massively multicore and high intra-chip memory bandwidth but limited memory capacity, presents an opportunity…
Tensor decomposition has been widely used in machine learning and high-volume data analysis. However, large-scale tensor factorization often consumes huge memory and computing cost. Meanwhile, modernized computing hardware such as tensor…
Matrix factorization (MF) discovers latent features from observations, which has shown great promises in the fields of collaborative filtering, data compression, feature extraction, word embedding, etc. While many problem-specific…
The widely-adopted practice is to train deep learning models with specialized hardware accelerators, e.g., GPUs or TPUs, due to their superior performance on linear algebra operations. However, this strategy does not employ effectively the…
Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…
Matrix Factorization (MF) on large scale matrices is computationally as well as memory intensive task. Alternative convergence techniques are needed when the size of the input matrix is higher than the available memory on a Central…
General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines,…
CPU-GPU heterogeneous systems are now commonly used in HPC (High-Performance Computing). However, improving the utilization and energy-efficiency of such systems is still one of the most critical issues. As one single program typically…
We propose a new hybrid topology optimization algorithm based on multigrid approach that combines the parallelization strategy of CPU using OpenMP and heavily multithreading capabilities of modern Graphics Processing Units (GPU). In…
The geometric multigrid method (GMG) is one of the most efficient solving techniques for discrete algebraic systems arising from elliptic partial differential equations. GMG utilizes a hierarchy of grids or discretizations and reduces the…
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality reduction, data compression and deep learning algorithms. Typical solutions for matrix decompositions have polynomial complexity which…
For large matrix factorisation problems, we develop a distributed Markov Chain Monte Carlo (MCMC) method based on stochastic gradient Langevin dynamics (SGLD) that we call Parallel SGLD (PSGLD). PSGLD has very favourable scaling properties…
There is an increased interest in building data analytics frameworks with advanced algebraic capabilities both in industry and academia. Many of these frameworks, e.g., TensorFlow and BIDMach, implement their compute-intensive primitives in…
In order to improve system performance efficiently, a number of systems choose to equip multi-core and many-core processors (such as GPUs). Due to their discrete memory these heterogeneous architectures comprise a distributed system within…
Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental operation in graph computing and analytics. However, the irregularity of real-world graphs poses significant challenges to achieving efficient SpMM operation for graph data on…
In myriad statistical applications, data are collected from related but heterogeneous sources. These sources share some commonalities while containing idiosyncratic characteristics. One of the most fundamental challenges in such scenarios…
Matrix factorization (MF) is a widely used collaborative filtering (CF) algorithm for recommendation systems (RSs), due to its high prediction accuracy, great flexibility and high efficiency in big data processing. However, with the…
The vision of super computer at every desk can be realized by powerful and highly parallel CPUs or GPUs or APUs. Graphics processors once specialized for the graphics applications only, are now used for the highly computational intensive…