Related papers: MERIT: Tensor Transform for Memory-Efficient Visio…

Towards Memory-Efficient Neural Networks via Multi-Level in situ Generation

Deep neural networks (DNN) have shown superior performance in a variety of tasks. As they rapidly evolve, their escalating computation and memory demands make it challenging to deploy them on resource-constrained edge devices. Though…

Machine Learning · Computer Science 2021-09-07 Jiaqi Gu , Hanqing Zhu , Chenghao Feng , Mingjie Liu , Zixuan Jiang , Ray T. Chen , David Z. Pan

NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

The inherent diversity of computation types within the deep neural network (DNN) models often requires a variety of specialized units in hardware processors, which limits computational efficiency, increasing both inference latency and power…

Machine Learning · Computer Science 2024-08-21 Ruiqi Sun , Siwei Ye , Jie Zhao , Xin He , Jianzhe Lin , Yiran Li , An Zou

Analyzing GPU Tensor Core Potential for Fast Reductions

The Nvidia GPU architecture has introduced new computing elements such as the \textit{tensor cores}, which are special processing units dedicated to perform fast matrix-multiply-accumulate (MMA) operations and accelerate \textit{Deep…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-12 Roberto Carrasco , Raimundo Vega , Cristóbal A. Navarro

PIM-DRAM: Accelerating Machine Learning Workloads using Processing in Commodity DRAM

Deep Neural Networks (DNNs) have transformed the field of machine learning and are widely deployed in many applications involving image, video, speech and natural language processing. The increasing compute demands of DNNs have been widely…

Machine Learning · Computer Science 2021-08-17 Sourjya Roy , Mustafa Ali , Anand Raghunathan

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning

Recent studies from several hyperscalars pinpoint to embedding layers as the most memory-intensive deep learning (DL) algorithm being deployed in today's datacenters. This paper addresses the memory capacity and bandwidth challenges of…

Machine Learning · Computer Science 2019-08-27 Youngeun Kwon , Yunjae Lee , Minsoo Rhu

Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs

Despite foreseeing tremendous speedups over conventional deep neural networks, the performance advantage of binarized neural networks (BNNs) has merely been showcased on general-purpose processors such as CPUs and GPUs. In fact, due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-16 Ang Li , Simon Su

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

RAPIDNN: In-Memory Deep Neural Network Acceleration Framework

Deep neural networks (DNN) have demonstrated effectiveness for various applications such as image processing, video segmentation, and speech recognition. Running state-of-the-art DNNs on current systems mostly relies on either…

Neural and Evolutionary Computing · Computer Science 2019-04-15 Mohsen Imani , Mohammad Samragh , Yeseong Kim , Saransh Gupta , Farinaz Koushanfar , Tajana Rosing

Leveraging the HW/SW Optimizations and Ecosystems that Drive the AI Revolution

This paper presents a state-of-the-art overview on how to architect, design, and optimize Deep Neural Networks (DNNs) such that performance is improved and accuracy is preserved. The paper covers a set of optimizations that span the entire…

Machine Learning · Computer Science 2022-08-05 Humberto Carvalho , Pavel Zaykov , Asim Ukaye

NAT: Neural Architecture Transformer for Accurate and Compact Architectures

Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods.…

Machine Learning · Computer Science 2020-01-14 Yong Guo , Yin Zheng , Mingkui Tan , Qi Chen , Jian Chen , Peilin Zhao , Junzhou Huang

Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources. The processing-in-memory (PIM) paradigm, where computation is…

Hardware Architecture · Computer Science 2023-03-28 Geraldo F. Oliveira , Juan Gómez-Luna , Saugata Ghose , Amirali Boroumand , Onur Mutlu

Dynamic DNNs and Runtime Management for Efficient Inference on Mobile/Embedded Devices

Deep neural network (DNN) inference is increasingly being executed on mobile and embedded platforms due to several key advantages in latency, privacy and always-on availability. However, due to limited computing resources, efficient DNN…

Computer Vision and Pattern Recognition · Computer Science 2024-01-18 Lei Xun , Jonathon Hare , Geoff V. Merrett

GradPIM: A Practical Processing-in-DRAM Architecture for Gradient Descent

In this paper, we present GradPIM, a processing-in-memory architecture which accelerates parameter updates of deep neural networks training. As one of processing-in-memory techniques that could be realized in the near future, we propose an…

Machine Learning · Computer Science 2021-02-16 Heesu Kim , Hanmin Park , Taehyun Kim , Kwanheum Cho , Eojin Lee , Soojung Ryu , Hyuk-Jae Lee , Kiyoung Choi , Jinho Lee

Benchmark Analysis of Representative Deep Neural Network Architectures

This work presents an in-depth analysis of the majority of the deep neural networks (DNNs) proposed in the state of the art for image recognition. For each DNN multiple performance indices are observed, such as recognition accuracy, model…

Computer Vision and Pattern Recognition · Computer Science 2023-09-15 Simone Bianco , Remi Cadene , Luigi Celona , Paolo Napoletano

EN-T: Optimizing Tensor Computing Engines Performance via Encoder-Based Methodology

Tensor computations, with matrix multiplication being the primary operation, serve as the fundamental basis for data analysis, physics, machine learning, and deep learning. As the scale and complexity of data continue to grow rapidly, the…

Hardware Architecture · Computer Science 2024-10-24 Qizhe Wu , Yuchen Gui , Zhichen Zeng , Xiaotian Wang , Huawen Liang , Xi Jin

Exploring Hyper-Parameter Optimization for Neural Machine Translation on GPU Architectures

Neural machine translation (NMT) has been accelerated by deep learning neural networks over statistical-based approaches, due to the plethora and programmability of commodity heterogeneous computing architectures such as FPGAs and GPUs and…

Computation and Language · Computer Science 2021-09-15 Robert Lim , Kenneth Heafield , Hieu Hoang , Mark Briers , Allen Malony

BCIM: Efficient Implementation of Binary Neural Network Based on Computation in Memory

Applications of Binary Neural Networks (BNNs) are promising for embedded systems with hard constraints on computing power. Contrary to conventional neural networks with the floating-point datatype, BNNs use binarized weights and activations…

Emerging Technologies · Computer Science 2022-11-14 Mahdi Zahedi , Taha Shahroodi , Stephan Wong , Said Hamdioui

FourierPIM: High-Throughput In-Memory Fast Fourier Transform and Polynomial Multiplication

The Discrete Fourier Transform (DFT) is essential for various applications ranging from signal processing to convolution and polynomial multiplication. The groundbreaking Fast Fourier Transform (FFT) algorithm reduces DFT time complexity…

Hardware Architecture · Computer Science 2023-04-06 Orian Leitersdorf , Yahav Boneh , Gonen Gazit , Ronny Ronen , Shahar Kvatinsky

Efficient Mixed-Precision Matrix Factorization of the Inverse Overlap Matrix in Electronic Structure Calculations with AI-Hardware and GPUs

In recent years, a new kind of accelerated hardware has gained popularity in the Artificial Intelligence (AI) and Machine Learning (ML) communities which enables extremely high-performance tensor contractions in reduced precision for deep…

Computational Physics · Physics 2024-05-01 Adela Habib , Joshua Finkelstein , Anders M. N. Niklasson

Empirically Accelerating Scaled Gradient Projection Using Deep Neural Network For Inverse Problems In Image Processing

Recently, deep neural networks (DNNs) have shown advantages in accelerating optimization algorithms. One approach is to unfold finite number of iterations of conventional optimization algorithms and to learn parameters in the algorithms.…

Machine Learning · Computer Science 2021-04-23 Byung Hyun Lee , Se Young Chun