Related papers: Multi-user Co-inference with Batch Processing Capa…

Joint Optimization of Offloading, Batching and DVFS for Multiuser Co-Inference

With the growing integration of artificial intelligence in mobile applications, a substantial number of deep neural network (DNN) inference requests are generated daily by mobile devices. Serving these requests presents significant…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-22 Yaodan Xu , Sheng Zhou , Zhisheng Niu

Resource Allocation for Multiuser Edge Inference with Batching and Early Exiting (Extended Version)

The deployment of inference services at the network edge, called edge inference, offloads computation-intensive inference tasks from mobile devices to edge servers, thereby enhancing the former's capabilities and battery lives. In a…

Information Theory · Computer Science 2023-01-02 Zhiyan Liu , Qiao Lan , Kaibin Huang

Benchmarking Edge AI Platforms for High-Performance ML Inference

Edge computing's growing prominence, due to its ability to reduce communication latency and enable real-time processing, is promoting the rise of high-performance, heterogeneous System-on-Chip solutions. While current approaches often…

Artificial Intelligence · Computer Science 2024-09-24 Rakshith Jayanth , Neelesh Gupta , Viktor Prasanna

Queueing Analysis of GPU-Based Inference Servers with Dynamic Batching: A Closed-Form Characterization

GPU-accelerated computing is a key technology to realize high-speed inference servers using deep neural networks (DNNs). An important characteristic of GPU-based inference is that the computational efficiency, in terms of the processing…

Performance · Computer Science 2021-01-13 Yoshiaki Inoue

Energy-Aware Multi-Server Mobile Edge Computing: A Deep Reinforcement Learning Approach

We investigate the problem of computation offloading in a mobile edge computing architecture, where multiple energy-constrained users compete to offload their computational tasks to multiple servers through a shared wireless medium. We…

Information Theory · Computer Science 2019-12-24 Navid Naderializadeh , Morteza Hashemi

Energy-Efficient Processing and Robust Wireless Cooperative Transmission for Edge Inference

Edge machine learning can deliver low-latency and private artificial intelligent (AI) services for mobile devices by leveraging computation and storage resources at the network edge. This paper presents an energy-efficient edge processing…

Information Theory · Computer Science 2020-03-03 Kai Yang , Yuanming Shi , Wei Yu , Zhi Ding

Intra-DP: A High Performance Collaborative Inference System for Mobile Edge Computing

Deploying deep neural networks (DNNs) on resource-constrained mobile devices presents significant challenges, particularly in achieving real-time performance while simultaneously coping with limited computational resources and battery life.…

Networking and Internet Architecture · Computer Science 2025-09-24 Zekai Sun , Xiuxian Guan , Zheng Lin , Zihan Fang , Xiangming Cai , Zhe Chen , Fangming Liu , Heming Cui , Jie Xiong , Wei Ni , Chau Yuen

An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

Ensembles of Deep Neural Networks (DNNs) have achieved qualitative predictions but they are computing and memory intensive. Therefore, the demand is growing to make them answer a heavy workload of requests with available computational…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-31 Pierrick Pochelu , Serge G. Petiton , Bruno Conche

Accelerating Exact and Approximate Inference for (Distributed) Discrete Optimization with GPUs

Discrete optimization is a central problem in artificial intelligence. The optimization of the aggregated cost of a network of cost functions arises in a variety of problems including (W)CSP, DCOP, as well as optimization in stochastic…

Artificial Intelligence · Computer Science 2018-01-12 Ferdinando Fioretto , Enrico Pontelli , William Yeoh , Rina Dechter

Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson -- Extended

The proliferation of IoT devices and advancements in network technologies have intensified the demand for real-time data processing at the network edge. To address these demands, low-power AI accelerators, particularly GPUs, are…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-13 Abhinaba Chakraborty , Wouter Tavernier , Akis Kourtis , Mario Pickavet , Andreas Oikonomakis , Didier Colle

Com-DDPG: A Multiagent Reinforcement Learning-based Offloading Strategy for Mobile Edge Computing

The development of mobile services has impacted a variety of computation-intensive and time-sensitive applications, such as recommendation systems and daily payment methods. However, computing task competition involving limited resources…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-10 Honghao Gao , Xuejie Wang , Xiaojin Ma , Wei Wei , Shahid Mumtaz

Communication-Efficient Separable Neural Network for Distributed Inference on Edge Devices

The inference of Neural Networks is usually restricted by the resources (e.g., computing power, memory, bandwidth) on edge devices. In addition to improving the hardware design and deploying efficient models, it is possible to aggregate the…

Machine Learning · Computer Science 2021-11-05 Jun-Liang Lin , Sheng-De Wang

Decentralized Computation Offloading and Resource Allocation in Heterogeneous Networks with Mobile Edge Computing

We consider a heterogeneous network with mobile edge computing, where a user can offload its computation to one among multiple servers. In particular, we minimize the system-wide computation overhead by jointly optimizing the individual…

Networking and Internet Architecture · Computer Science 2018-03-05 Quoc-Viet Pham , Tuan LeAnh , Nguyen H. Tran , Choong Seon Hong

Decentralized Low-Latency Collaborative Inference via Ensembles on the Edge

The success of deep neural networks (DNNs) is heavily dependent on computational resources. While DNNs are often employed on cloud servers, there is a growing need to operate DNNs on edge devices. Edge devices are typically limited in their…

Machine Learning · Computer Science 2022-06-08 May Malka , Erez Farhan , Hai Morgenstern , Nir Shlezinger

Joint Scheduling of Sensing Data Offloading and Edge Inference for Multi-UAV Networks

Unmanned aerial vehicles (UAVs) often collaborate by collecting and offloading sensing streams to an edge server, where a deep neural network (DNN) model performs cross-stream alignment, fusion, and inference. However, the coupling between…

Signal Processing · Electrical Eng. & Systems 2026-05-06 Yanan Du , Sai Xu , Yinbo Yu

Sparse Optimization for Green Edge AI Inference

With the rapid upsurge of deep learning tasks at the network edge, effective edge artificial intelligence (AI) inference becomes critical to provide low-latency intelligent services for mobile users via leveraging the edge computing…

Information Theory · Computer Science 2024-10-30 Xiangyu Yang , Sheng Hua , Yuanming Shi , Hao Wang , Jun Zhang , Khaled B. Letaief

Inference Time Optimization Using BranchyNet Partitioning

Deep Neural Network (DNN) applications with edge computing presents a trade-off between responsiveness and computational resources. On one hand, edge computing can provide high responsiveness deploying computational resources close to end…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-29 Roberto G. Pacheco , Rodrigo S. Couto

Model-driven Cluster Resource Management for AI Workloads in Edge Clouds

Since emerging edge applications such as Internet of Things (IoT) analytics and augmented reality have tight latency constraints, hardware AI accelerators have been recently proposed to speed up deep neural network (DNN) inference run by…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-20 Qianlin Liang , Walid A. Hanafy , Ahmed Ali-Eldin , Prashant Shenoy

Collaborative Inference for Large Models with Task Offloading and Early Exiting

In 5G smart cities, edge computing is employed to provide nearby computing services for end devices, and the large-scale models (e.g., GPT and LLaMA) can be deployed at the network edge to boost the service quality. However, due to the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-12 Zuan Xie , Yang Xu , Hongli Xu , Yunming Liao , Zhiyuan Yao

Edge-device Collaborative Computing for Multi-view Classification

Motivated by the proliferation of Internet-of-Thing (IoT) devices and the rapid advances in the field of deep learning, there is a growing interest in pushing deep learning computations, conventionally handled by the cloud, to the edge of…

Machine Learning · Computer Science 2024-09-25 Marco Palena , Tania Cerquitelli , Carla Fabiana Chiasserini