Related papers: Improving the Neural GPU Architecture for Algorith…

Extensions and Limitations of the Neural GPU

The Neural GPU is a recent model that can learn algorithms such as multi-digit binary addition and binary multiplication in a way that generalizes to inputs of arbitrary length. We show that there are two simple ways of improving the…

Neural and Evolutionary Computing · Computer Science 2016-11-08 Eric Price , Wojciech Zaremba , Ilya Sutskever

Neural GPUs Learn Algorithms

Learning an algorithm from examples is a fundamental problem that has been widely studied. Recently it has been addressed using neural networks, in particular by Neural Turing Machines (NTMs). These are fully differentiable computers that…

Machine Learning · Computer Science 2016-03-16 Łukasz Kaiser , Ilya Sutskever

Deep Learning and Machine Learning with GPGPU and CUDA: Unlocking the Power of Parallel Computing

General Purpose Graphics Processing Unit (GPGPU) computing plays a transformative role in deep learning and machine learning by leveraging the computational advantages of parallel processing. Through the power of Compute Unified Device…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-20 Ming Li , Ziqian Bi , Tianyang Wang , Yizhu Wen , Qian Niu , Xinyuan Song , Zekun Jiang , Junyu Liu , Benji Peng , Sen Zhang , Xuanhe Pan , Jiawei Xu , Jinlang Wang , Keyu Chen , Caitlyn Heqi Yin , Pohsun Feng , Ming Liu

The State of the Art when using GPUs in Devising Image Generation Methods Using Deep Learning

Deep learning is a technique for machine learning using multi-layer neural networks. It has been used for image synthesis and image recognition, but in recent years, it has also been used for various social detection and social labeling. In…

Computer Vision and Pattern Recognition · Computer Science 2021-09-14 Yasuko Kawahata

Large Scale Artificial Neural Network Training Using Multi-GPUs

This paper describes a method for accelerating large scale Artificial Neural Networks (ANN) training using multi-GPUs by reducing the forward and backward passes to matrix multiplication. We propose an out-of-core multi-GPU matrix…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-16 Linnan Wang , Wei Wu , Jianxiong Xiao , Yang Yi

Scalable Graph Embedding LearningOn A Single GPU

Graph embedding techniques have attracted growing interest since they convert the graph data into continuous and low-dimensional space. Effective graph analytic provides users a deeper understanding of what is behind the data and thus can…

Machine Learning · Computer Science 2022-01-21 Azita Nouri , Philip E. Davis , Pradeep Subedi , Manish Parashar

Akceleracja obliczen algebry liniowej z wykorzystaniem masywnie rownoleglych, wielordzeniowych procesorow GPU

The paper presents the aspect of use of modern graphics accelerators supporting CUDA technology for high-performance computing in the field of linear algebra. Fully programmable graphic cards have been available for several years for both…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-06-27 Lukasz Swierczewski

Neural Algorithmic Reasoning

Algorithms have been fundamental to recent global technological advances and, in particular, they have been the cornerstone of technical advances in one field rapidly being applied to another. We argue that algorithms possess fundamentally…

Machine Learning · Computer Science 2021-08-09 Petar Veličković , Charles Blundell

Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems

We propose a generic algorithmic building block to accelerate training of machine learning models on heterogeneous compute systems. Our scheme allows to efficiently employ compute accelerators such as GPUs and FPGAs for the training of…

Machine Learning · Computer Science 2017-11-08 Celestine Dünner , Thomas Parnell , Martin Jaggi

A Survey of FPGA Based Deep Learning Accelerators: Challenges and Opportunities

With the rapid development of in-depth learning, neural network and deep learning algorithms have been widely used in various fields, e.g., image, video and voice processing. However, the neural network model is getting larger and larger,…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-30 Teng Wang , Chao Wang , Xuehai Zhou , Huaping Chen

Deep Graph Library Optimizations for Intel(R) x86 Architecture

The Deep Graph Library (DGL) was designed as a tool to enable structure learning from graphs, by supporting a core abstraction for graphs, including the popular Graph Neural Networks (GNN). DGL contains implementations of all core graph…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-14 Sasikanth Avancha , Vasimuddin Md , Sanchit Misra , Ramanarayan Mohanty

Making Neural Networks More Suitable for Approximate Clifford+T Circuit Synthesis

Machine Learning with deep neural networks has transformed computational approaches to scientific and engineering problems. Central to many of these advancements are precisely tuned neural architectures that are tailored to the domains in…

Quantum Physics · Physics 2025-04-23 Mathias Weiden , Justin Kalloor , John Kubiatowicz , Costin Iancu

Deep Modulation Embedding

Deep neural network has recently shown very promising applications in different research directions and attracted the industry attention as well. Although the idea was introduced in the past but just recently the main limitation of using…

Signal Processing · Electrical Eng. & Systems 2019-04-16 Amin Abbasloo , Alan Salari

Deep Learning on FPGAs: Past, Present, and Future

The rapid growth of data size and accessibility in recent years has instigated a shift of philosophy in algorithm design for artificial intelligence. Instead of engineering algorithms by hand, the ability to learn composable systems…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-02-16 Griffin Lacey , Graham W. Taylor , Shawki Areibi

Learning Gradient Descent: Better Generalization and Longer Horizons

Training deep neural networks is a highly nontrivial task, involving carefully selecting appropriate training algorithms, scheduling step sizes and tuning other hyperparameters. Trying different combinations can be quite labor-intensive and…

Machine Learning · Computer Science 2017-06-13 Kaifeng Lv , Shunhua Jiang , Jian Li

Numerical integration on GPUs for higher order finite elements

The paper considers the problem of implementation on graphics processors of numerical integration routines for higher order finite element approximations. The design of suitable GPU kernels is investigated in the context of general purpose…

Mathematical Software · Computer Science 2014-03-03 Krzysztof Banaś , Przemysław Płaszewski , Paweł Macioł

Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid

A Multigrid Full Approximation Storage algorithm for solving Deep Residual Networks is developed to enable neural network parallelized layer-wise training and concurrent computational kernel execution on GPUs. This work demonstrates a 10.2x…

Machine Learning · Computer Science 2020-09-01 Andrew C. Kirby , Siddharth Samsi , Michael Jones , Albert Reuther , Jeremy Kepner , Vijay Gadepally

Automated Architecture Design for Deep Neural Networks

Machine learning has made tremendous progress in recent years and received large amounts of public attention. Though we are still far from designing a full artificially intelligent agent, machine learning has brought us many applications in…

Machine Learning · Computer Science 2019-08-29 Steven Abreu

Parareal Neural Networks Emulating a Parallel-in-time Algorithm

As deep neural networks (DNNs) become deeper, the training time increases. In this perspective, multi-GPU parallel computing has become a key tool in accelerating the training of DNNs. In this paper, we introduce a novel methodology to…

Numerical Analysis · Mathematics 2024-07-08 Chang-Ock Lee , Youngkyu Lee , Jongho Park

Prediction of GPU Failures Under Deep Learning Workloads

Graphics processing units (GPUs) are the de facto standard for processing deep learning (DL) tasks. Meanwhile, GPU failures, which are inevitable, cause severe consequences in DL tasks: they disrupt distributed trainings, crash inference…

Machine Learning · Computer Science 2022-01-31 Heting Liu , Zhichao Li , Cheng Tan , Rongqiu Yang , Guohong Cao , Zherui Liu , Chuanxiong Guo