Related papers: Accelerating Deep Learning Inference via Freezing

CacheNet: A Model Caching Framework for Deep Learning Inference on the Edge

The success of deep neural networks (DNN) in machine perception applications such as image classification and speech recognition comes at the cost of high computation and storage complexity. Inference of uncompressed large scale DNN models…

Machine Learning · Computer Science 2020-07-06 Yihao Fang , Shervin Manzuri Shalmani , Rong Zheng

Accelerating Deep Learning Inference via Learned Caches

Deep Neural Networks (DNNs) are witnessing increased adoption in multiple domains owing to their high accuracy in solving real-world problems. However, this high accuracy has been achieved by building deeper networks, posing a fundamental…

Machine Learning · Computer Science 2021-01-20 Arjun Balasubramanian , Adarsh Kumar , Yuhan Liu , Han Cao , Shivaram Venkataraman , Aditya Akella

Improving the Performance of DNN-based Software Services using Automated Layer Caching

Deep Neural Networks (DNNs) have become an essential component in many application domains including web-based services. A variety of these services require high throughput and (close to) real-time features, for instance, to respond or…

Machine Learning · Computer Science 2022-09-20 Mohammadamin Abedi , Yanni Iouannou , Pooyan Jamshidi , Hadi Hemmati

Rethinking the Potential of Layer Freezing for Efficient DNN Training

With the growing size of deep neural networks and datasets, the computational costs of training have significantly increased. The layer-freezing technique has recently attracted great attention as a promising method to effectively reduce…

Machine Learning · Computer Science 2025-08-22 Chence Yang , Ci Zhang , Lei Lu , Qitao Tan , Sheng Li , Ao Li , Xulong Tang , Shaoyi Huang , Jinzhen Wang , Guoming Li , Jundong Li , Xiaoming Zhai , Jin Lu , Geng Yuan

FreezeOut: Accelerate Training by Progressively Freezing Layers

The early layers of a deep neural net have the fewest parameters, but take up the most computation. In this extended abstract, we propose to only train the hidden layers for a set portion of the training run, freezing them out one-by-one…

Machine Learning · Statistics 2017-06-20 Andrew Brock , Theodore Lim , J. M. Ritchie , Nick Weston

ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering

Graphics rendering applications increasingly leverage neural networks in tasks such as denoising, supersampling, and frame extrapolation to improve image quality while maintaining frame rates. The temporal coherence inherent in these tasks…

Graphics · Computer Science 2025-06-18 Lufei Liu , Tor M. Aamodt

The Architectural Implications of Facebook's DNN-based Personalized Recommendation

The widespread application of deep learning has changed the landscape of computation in the data center. In particular, personalized recommendation for content ranking is now largely accomplished leveraging deep neural networks. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-18 Udit Gupta , Carole-Jean Wu , Xiaodong Wang , Maxim Naumov , Brandon Reagen , David Brooks , Bradford Cottel , Kim Hazelwood , Bill Jia , Hsien-Hsin S. Lee , Andrey Malevich , Dheevatsa Mudigere , Mikhail Smelyanskiy , Liang Xiong , Xuan Zhang

A Progressive Sub-Network Searching Framework for Dynamic Inference

Many techniques have been developed, such as model compression, to make Deep Neural Networks (DNNs) inference more efficiently. Nevertheless, DNNs still lack excellent run-time dynamic inference capability to enable users trade-off accuracy…

Computer Vision and Pattern Recognition · Computer Science 2020-09-15 Li Yang , Zhezhi He , Yu Cao , Deliang Fan

Modeling of Deep Neural Network (DNN) Placement and Inference in Edge Computing

With the edge computing becoming an increasingly adopted concept in system architectures, it is expected its utilization will be additionally heightened when combined with deep learning (DL) techniques. The idea behind integrating demanding…

Networking and Internet Architecture · Computer Science 2020-03-12 Mounir Bensalem , Jasenka Dizdarević , Admela Jukan

DEFER: Distributed Edge Inference for Deep Neural Networks

Modern machine learning tools such as deep neural networks (DNNs) are playing a revolutionary role in many fields such as natural language processing, computer vision, and the internet of things. Once they are trained, deep learning models…

Machine Learning · Computer Science 2022-01-19 Arjun Parthasarathy , Bhaskar Krishnamachari

Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection

Deep neural networks ( DNNs ) are becoming a key enabling technology for many application domains. However, on-device inference on battery-powered, resource-constrained embedding systems is often infeasible due to prohibitively long…

Machine Learning · Computer Science 2019-11-13 Vicent Sanz Marco , Ben Taylor , Zheng Wang , Yehia Elkhatib

Egeria: Efficient DNN Training with Knowledge-Guided Layer Freezing

Training deep neural networks (DNNs) is time-consuming. While most existing solutions try to overlap/schedule computation and communication for efficient training, this paper goes one step further by skipping computing and communication…

Machine Learning · Computer Science 2023-03-14 Yiding Wang , Decang Sun , Kai Chen , Fan Lai , Mosharaf Chowdhury

High Utilization Energy-Aware Real-Time Inference Deep Convolutional Neural Network Accelerator

Deep convolution Neural Network (DCNN) has been widely used in computer vision tasks. However, for edge devices even inference has too large computational complexity and data access amount. The inference latency of state-of-the-art models…

Hardware Architecture · Computer Science 2025-09-09 Kuan-Ting Lin , Ching-Te Chiu , Jheng-Yi Chang , Shi-Zong Huang , Yu-Ting Li

Auto-tuning Neural Network Quantization Framework for Collaborative Inference Between the Cloud and Edge

Recently, deep neural networks (DNNs) have been widely applied in mobile intelligent applications. The inference for the DNNs is usually performed in the cloud. However, it leads to a large overhead of transmitting data via wireless…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-19 Guangli Li , Lei Liu , Xueying Wang , Xiao Dong , Peng Zhao , Xiaobing Feng

On the Acceleration of Deep Neural Network Inference using Quantized Compressed Sensing

Accelerating deep neural network (DNN) inference on resource-limited devices is one of the most important barriers to ensuring a wider and more inclusive adoption. To alleviate this, DNN binary quantization for faster convolution and memory…

Machine Learning · Computer Science 2021-08-24 Meshia Cédric Oveneke

Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks

The large computing and memory cost of deep neural networks (DNNs) often precludes their use in resource-constrained devices. Quantizing the parameters and operations to lower bit-precision offers substantial memory and energy savings for…

Machine Learning · Computer Science 2023-09-01 Clemens JS Schaefer , Siddharth Joshi , Shan Li , Raul Blazquez

Boosting DNN Cold Inference on Edge Devices

DNNs are ubiquitous on edge devices nowadays. With its increasing importance and use cases, it's not likely to pack all DNNs into device memory and expect that each inference has been warmed up. Therefore, cold inference, the process to…

Machine Learning · Computer Science 2023-08-29 Rongjie Yi , Ting Cao , Ao Zhou , Xiao Ma , Shangguang Wang , Mengwei Xu

A Converting Autoencoder Toward Low-latency and Energy-efficient DNN Inference at the Edge

Reducing inference time and energy usage while maintaining prediction accuracy has become a significant concern for deep neural networks (DNN) inference on resource-constrained edge devices. To address this problem, we propose a novel…

Machine Learning · Computer Science 2024-03-13 Hasanul Mahmud , Peng Kang , Kevin Desai , Palden Lama , Sushil Prasad

ResNet: Enabling Deep Convolutional Neural Networks through Residual Learning

Convolutional Neural Networks (CNNs) has revolutionized computer vision, but training very deep networks has been challenging due to the vanishing gradient problem. This paper explores Residual Networks (ResNet), introduced by He et al.…

Computer Vision and Pattern Recognition · Computer Science 2025-10-29 Xingyu Liu , Kun Ming Goh

Decentralized Low-Latency Collaborative Inference via Ensembles on the Edge

The success of deep neural networks (DNNs) is heavily dependent on computational resources. While DNNs are often employed on cloud servers, there is a growing need to operate DNNs on edge devices. Edge devices are typically limited in their…

Machine Learning · Computer Science 2022-06-08 May Malka , Erez Farhan , Hai Morgenstern , Nir Shlezinger