Related papers: Mixed-precision deep learning based on computation…
Deep neural networks have revolutionized the field of machine learning by providing unprecedented human-like performance in solving many real-world problems such as image and speech recognition. Training of large DNNs, however, is a…
In-memory computing is a promising non-von Neumann approach for making energy-efficient deep learning inference hardware. Crossbar arrays of resistive memory devices can be used to encode the network weights and perform efficient analog…
Analog crossbar architectures for accelerating neural network training and inference have made tremendous progress over the past several years. These architectures are ideal for dense layers with fewer than roughly a thousand neurons.…
The need for deep neural network (DNN) models with higher performance and better functionality leads to the proliferation of very large models. Model training, however, requires intensive computation time and energy. Memristor-based…
Deep learning has made remarkable progress in various tasks, surpassing human performance in some cases. However, one drawback of neural networks is catastrophic forgetting, where a network trained on one task forgets the solution when…
The memory demands of large-scale deep neural networks (DNNs) require synaptic weight values to be stored and updated in off-chip memory like dynamic random-access memory, which reduces energy efficiency and increases training time.…
The large computing and memory cost of deep neural networks (DNNs) often precludes their use in resource-constrained devices. Quantizing the parameters and operations to lower bit-precision offers substantial memory and energy savings for…
The cost involved in training deep neural networks (DNNs) on von-Neumann architectures has motivated the development of novel solutions for efficient DNN training accelerators. We propose a hybrid in-memory computing (HIC) architecture for…
Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models…
Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for…
In this paper, we propose StruM, a novel structured mixed-precision-based deep learning inference method, co-designed with its associated hardware accelerator (DPU), to address the escalating computational and memory demands of deep…
As CMOS scaling reaches its technological limits, a radical departure from traditional von Neumann systems, which involve separate processing and memory units, is needed in order to significantly extend the performance of today's computers.…
Deep learning algorithms have shown tremendous success in many recognition tasks; however, these algorithms typically include a deep neural network (DNN) structure and a large number of parameters, which makes it challenging to implement…
Large Deep Neural Networks (DNNs) are the backbone of today's artificial intelligence due to their ability to make accurate predictions when being trained on huge datasets. With advancing technologies, such as the Internet of Things,…
Reduced precision computation for deep neural networks is one of the key areas addressing the widening compute gap driven by an exponential growth in model size. In recent years, deep learning training has largely migrated to 16-bit…
Deep Neural Networks (DNNs) have gained immense success in cognitive applications and greatly pushed today's artificial intelligence forward. The biggest challenge in executing DNNs is their extremely data-extensive computations. The…
Deep Neural Networks (DNNs) are computationally and memory intensive, which makes their hardware implementation a challenging task especially for resource constrained devices such as IoT nodes. To address this challenge, this paper…
The widespread integration of embedded systems across various industries has facilitated seamless connectivity among devices and bolstered computational capabilities. Despite their extensive applications, embedded systems encounter…
A trend towards energy-efficiency, security and privacy has led to a recent focus on deploying DNNs on microcontrollers. However, limits on compute and memory resources restrict the size and the complexity of the ML models deployable in…
Training of deep neural networks (DNNs) is a computationally intensive task and requires massive volumes of data transfer. Performing these operations with the conventional von Neumann architectures creates unmanageable time and power…