English
Related papers

Related papers: Generating coupled cluster code for modern distrib…

200 papers

In this paper, we report a reimplementation of the core algorithms of relativistic coupled cluster theory aimed at modern heterogeneous high-performance computational infrastructures. The code is designed for efficient parallel execution on…

Many research works have been performed on implementation of Vitrerbi decoding algorithm on GPU instead of FPGA because this platform provides considerable flexibility in addition to great performance. Recently, the recently-introduced…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-30 Alireza Mohammadidoost , Matin Hashemi

Tensor decomposition has been widely used in machine learning and high-volume data analysis. However, large-scale tensor factorization often consumes huge memory and computing cost. Meanwhile, modernized computing hardware such as tensor…

Optimization and Control · Mathematics 2022-09-12 Zi Yang , Junnan Shan , Zheng Zhang

CP decomposition is a powerful tool for data science, especially gene analysis, deep learning, and quantum computation. However, the application of tensor decomposition is largely hindered by the exponential increment of the computational…

Machine Learning · Computer Science 2023-11-27 Zeliang Zhang , Zhuo Liu , Susan Liang , Zhiyuan Wang , Yifan Zhu , Chen Ding , Chenliang Xu

In this work, we introduce new batching algorithms to effectively handle large contractions encountered in coupled-cluster singles and doubles (CCSD) implementations in Python on the Video Random Access Memory (VRAM) of graphical processing…

Recommendation systems, social network analysis, medical imaging, and data mining often involve processing sparse high-dimensional data. Such high-dimensional data are naturally represented as tensors, and they cannot be efficiently…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-22 Weiyun Jiang , Kaiqi Zhang , Colin Yu Lin , Feng Xing , Zheng Zhang

In this paper, we develop software for decomposing sparse tensors that is portable to and performant on a variety of multicore, manycore, and GPU computing architectures. The result is a single code whose performance matches optimized…

Mathematical Software · Computer Science 2019-07-30 Eric Phipps , Tamara G. Kolda

Experience shows that on today's high performance systems the utilization of different acceleration cards in conjunction with a high utilization of all other parts of the system is difficult. Future architectures, like exascale clusters,…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-07 Patrick Diehl , Madhavan Seshadri , Thomas Heller , Hartmut Kaiser

During the past decade, Deep Learning (DL) algorithms, programming systems and hardware have converged with the High Performance Computing (HPC) counterparts. Nevertheless, the programming methodology of DL and HPC systems is stagnant,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-19 Evangelos Georganas , Dhiraj Kalamkar , Kirill Voronin , Abhisek Kundu , Antonio Noack , Hans Pabst , Alexander Breuer , Alexander Heinecke

We scrutinize how to accelerate the bottleneck operations of Pythonic coupled cluster implementations performed on a \texttt{NVIDIA} Tesla V100S PCIe 32GB (rev 1a) Graphics Processing Unit (GPU). The \texttt{NVIDIA} Compute Unified Device…

We extend an existing approach for efficient use of shared mapped memory across Chapel and C++ for graph data stored as 1-D arrays to sparse tensor data stored using a combination of 2-D and 1-D arrays. We describe the specific extensions…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-18 S. Isaac Geronimo Anderson , Daniel M. Dunlavy

To respond to the need of efficient training and inference of deep neural networks, a plethora of domain-specific hardware architectures have been introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A common feature…

Data Structures and Algorithms · Computer Science 2020-07-10 Rezaul Chowdhury , Francesco Silvestri , Flavio Vella

Graphics processing units (GPU) had evolved from a specialized hardware capable to render high quality graphics in games to a commodity hardware for effective processing blocks of data in a parallel schema. This evolution is particularly…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-03-26 Luis Cabellos

Tensor computations present significant performance challenges that impact a wide spectrum of applications ranging from machine learning, healthcare analytics, social network analysis, data mining to quantum chemistry and signal processing.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-06 Jiajia Li , Mahesh Lakshminarasimhan , Xiaolong Wu , Ang Li , Catherine Olschanowsky , Kevin Barker

This paper describes a parallel implementation of Viterbi decoding algorithm. Viterbi decoder is widely used in many state-of-the-art wireless systems. The proposed solution optimizes both throughput and memory usage by applying…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-19 Alireza Mohammadidoost , Matin Hashemi

High-order tensor decomposition has been widely adopted to obtain compact deep neural networks for edge deployment. However, existing studies focus primarily on its algorithmic advantages such as accuracy and compression ratio-while…

Hardware Architecture · Computer Science 2025-11-26 Jinsong Zhang , Minghe Li , Jiayi Tian , Jinming Lu , Zheng Zhang

TensorX is a Python library for prototyping, design, and deployment of complex neural network models in TensorFlow. A special emphasis is put on ease of use, performance, and API consistency. It aims to make available high-level components…

Machine Learning · Computer Science 2021-01-05 Davide Nunes , Luis Antunes

Our goal is compression of massive-scale grid-structured data, such as the multi-terabyte output of a high-fidelity computational simulation. For such data sets, we have developed a new software package called TuckerMPI, a parallel C++/MPI…

Mathematical Software · Computer Science 2020-07-09 Grey Ballard , Alicia Klinvex , Tamara G. Kolda

Improving the computational efficiency of quantum many-body calculations from a hardware perspective remains a critical challenge. Although field-programmable gate arrays (FPGAs) have recently been exploited to improve the computational…

Strongly Correlated Electrons · Physics 2026-02-06 Songtai Lv , Yang Liang , Rui Zhu , Qibin Zheng , Haiyuan Zou

Driven by the insatiable needs to process ever larger amount of data with more complex models, modern computer processors and accelerators are beginning to offer half precision floating point arithmetic support, and extremely optimized…

Mathematical Software · Computer Science 2019-12-12 Shaoshuai Zhang , Panruo Wu
‹ Prev 1 2 3 10 Next ›