Related papers: Improved parallelization techniques for the densit…

Parallelization Strategies for Density Matrix Renormalization Group Algorithms on Shared-Memory Systems

Shared-memory parallelization (SMP) strategies for density matrix renormalization group (DMRG) algorithms enable the treatment of complex systems in solid state physics. We present two different approaches by which parallelization of the…

Strongly Correlated Electrons · Physics 2009-11-10 G. Hager , E. Jeckelmann , H. Fehske , G. Wellein

Low communication high performance ab initio density matrix renormalization group algorithms

There has been recent interest in the deployment of ab initio density matrix renormalization group computations on high performance computing platforms. Here, we introduce a reformulation of the conventional distributed memory ab initio…

Chemical Physics · Physics 2021-06-24 Huanchen Zhai , Garnet Kin-Lic Chan

An additive two-level parallel variant of the DMRG algorithm with coarse-space correction

The density matrix renormalization group (DMRG) algorithm is a popular alternating minimization scheme for solving high-dimensional optimization problems in the tensor train format. Classical DMRG, however, is based on sequential…

Numerical Analysis · Mathematics 2025-12-09 Laura Grigori , Muhammad Hassan

Optimizing Distributed Training Approaches for Scaling Neural Networks

This paper presents a comparative analysis of distributed training strategies for large-scale neural networks, focusing on data parallelism, model parallelism, and hybrid approaches. We evaluate these strategies on image classification…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-01 Vishnu Vardhan Baligodugula , Fathi Amsaad

Real-Space Parallel Density Matrix Renormalization Group

We demonstrate how to parallelize the density matrix renormalization group (DMRG) algorithm in real space through a straightforward modification of serial DMRG. This makes it possible to apply at least an order of magnitude more…

Strongly Correlated Electrons · Physics 2013-04-25 E. M. Stoudenmire , Steven R. White

Distributed-Memory DMRG via Sparse and Dense Parallel Tensor Contractions

The Density Matrix Renormalization Group (DMRG) algorithm is a powerful tool for solving eigenvalue problems to model quantum systems. DMRG relies on tensor contractions and dense linear algebra to compute properties of condensed matter…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-26 Ryan Levy , Edgar Solomonik , Bryan K. Clark

Parallelizing the Approximate Minimum Degree Ordering Algorithm: Strategies and Evaluation

The approximate minimum degree algorithm is widely used before numerical factorization to reduce fill-in for sparse matrices. While considerable attention has been given to the numerical factorization process, less focus has been placed on…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-26 Yen-Hsiang Chang , Aydın Buluç , James Demmel

PaSE: Parallelization Strategies for Efficient DNN Training

Training a deep neural network (DNN) requires substantial computational and memory requirements. It is common to use multiple devices to train a DNN to reduce the overall training time. There are several choices to parallelize each layer in…

Machine Learning · Computer Science 2024-07-08 Venmugil Elango

Enhanced computation method of topological smoothing on shared memory parallel machines

To prepare images for better segmentation, we need preprocessing applications, such as smoothing, to reduce noise. In this paper, we present an enhanced computation method for smoothing 2D object in binary case. Unlike existing approaches,…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-03-31 Ramzi Mahmoudi , Mohamed Akil

A 2D Parallel Triangle Counting Algorithm for Distributed-Memory Architectures

Triangle counting is a fundamental graph analytic operation that is used extensively in network science and graph mining. As the size of the graphs that needs to be analyzed continues to grow, there is a requirement in developing scalable…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-24 Ancy Sarah Tom , George Karypis

Accelerated Parallel and Distributed Algorithm using Limited Internal Memory for Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF) is a powerful technique for dimension reduction, extracting latent factors and learning part-based representation. For large datasets, NMF performance depends on some major issues: fast algorithms,…

Optimization and Control · Mathematics 2015-07-01 Duy-Khuong Nguyen , Tu-Bao Ho

Parallel image thinning through topological operators on shared memory parallel machines

In this paper, we present a concurrent implementation of a powerful topological thinning operator. This operator is able to act directly over grayscale images without modifying their topology. We introduce an adapted parallelization…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-03-31 Ramzi Mahmoudi , Mohamed Akil , Petr Matas

Parallel Algorithms for Densest Subgraph Discovery Using Shared Memory Model

The problem of finding dense components of a graph is a widely explored area in data analysis, with diverse applications in fields and branches of study including community mining, spam detection, computer security and bioinformatics. This…

Information Retrieval · Computer Science 2021-03-02 B. D. M. De Zoysa , Y. A. M. M. A. Ali , M. D. I. Maduranga , Indika Perera , Saliya Ekanayake , Anil Vullikanti

Using parallelism techniques to improve sequential and multi-core sorting performance

We propose new sequential sorting operations by adapting techniques and methods used for designing parallel sorting algorithms. Although the norm is to parallelize a sequential algorithm to improve performance, we adapt a contrarian…

Data Structures and Algorithms · Computer Science 2016-09-01 Alexandros V Gerbessiotis

Density Matrix Renormalization Group and Reaction-Diffusion Processes

The density matrix renormalization group (DMRG) is applied to some one-dimensional reaction-diffusion models in the vicinity of and at their critical point. The stochastic time evolution for these models is given in terms of a non-symmetric…

Statistical Mechanics · Physics 2011-10-11 Enrico Carlon , Malte Henkel , Ulrich Schollwoeck

Efficient Distributed Semi-Supervised Learning using Stochastic Regularization over Affinity Graphs

We describe a computationally efficient, stochastic graph-regularization technique that can be utilized for the semi-supervised training of deep neural networks in a parallel or distributed setting. We utilize a technique, first described…

Machine Learning · Statistics 2018-05-31 Sunil Thulasidasan , Jeffrey Bilmes , Garrett Kenyon

Space and Time Efficient Parallel Graph Decomposition, Clustering, and Diameter Approximation

We develop a novel parallel decomposition strategy for unweighted, undirected graphs, based on growing disjoint connected clusters from batches of centers progressively selected from yet uncovered nodes. With respect to similar previous…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-02-09 Matteo Ceccarello , Andrea Pietracaprina , Geppino Pucci , Eli Upfal

Parallelizing Query Optimization on Shared-Nothing Architectures

Data processing systems offer an ever increasing degree of parallelism on the levels of cores, CPUs, and processing nodes. Query optimization must exploit high degrees of parallelism in order not to gradually become the bottleneck of query…

Databases · Computer Science 2015-11-06 Immanuel Trummer , Christoph Koch

Real-time topological image smoothing on shared memory parallel machines

Smoothing filter is the method of choice for image preprocessing and pattern recognition. We present a new concurrent method for smoothing 2D object in binary case. Proposed method provides a parallel computation while preserving the…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-01 Ramzi Mahmoudi , Mohamed Akil

Parallel Training of Deep Networks with Local Updates

Deep learning models trained on large data sets have been widely successful in both vision and language domains. As state-of-the-art deep learning architectures have continued to grow in parameter count so have the compute budgets and times…

Machine Learning · Computer Science 2021-06-16 Michael Laskin , Luke Metz , Seth Nabarro , Mark Saroufim , Badreddine Noune , Carlo Luschi , Jascha Sohl-Dickstein , Pieter Abbeel