English
Related papers

Related papers: SMDP-Based Dynamic Batching for Efficient Inferenc…

200 papers

For servers incorporating parallel computing resources, batching is a pivotal technique for providing efficient and economical services at scale. Parallel computing resources exhibit heightened computational and energy efficiency when…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-07 Yaodan Xu , Sheng Zhou , Zhisheng Niu

The increasing adoption of large language models (LLMs) necessitates inference serving systems that can deliver both high throughput and low latency. Deploying LLMs with hundreds of billions of parameters on memory-constrained GPUs exposes…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-10 Bowen Pang , Kai Li , Feifan Wang

GPU-accelerated computing is a key technology to realize high-speed inference servers using deep neural networks (DNNs). An important characteristic of GPU-based inference is that the computational efficiency, in terms of the processing…

Performance · Computer Science 2021-01-13 Yoshiaki Inoue

Autonomous marine vehicles play an essential role in many ocean science and engineering applications. Planning time and energy optimal paths for these vehicles to navigate in stochastic dynamic ocean environments is essential to reduce…

Artificial Intelligence · Computer Science 2021-09-21 Rohit Chowdhury , Deepak Subramani

Graphics processing units (GPUs) can improve deep neural network inference throughput via batch processing, where multiple tasks are concurrently processed. We focus on novel scenarios that the energy-constrained mobile devices offload…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-14 Wenqi Shi , Sheng Zhou , Zhisheng Niu , Miao Jiang , Lu Geng

Serving deep neural networks in latency critical interactive settings often requires GPU acceleration. However, the small batch sizes typical in online inference results in poor GPU utilization, a potential performance gap which GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-03 Paras Jain , Xiangxi Mo , Ajay Jain , Harikaran Subbaraj , Rehan Sohail Durrani , Alexey Tumanov , Joseph Gonzalez , Ion Stoica

Massive multi-threading in GPU imposes tremendous pressure on memory subsystems. Due to rapid growth in thread-level parallelism of GPU and slowly improved peak memory bandwidth, the memory becomes a bottleneck of GPU's performance and…

Hardware Architecture · Computer Science 2019-06-17 Bing Li , Mengjie Mao , Xiaoxiao Liu , Tao Liu , Zihao Liu , Wujie Wen , Yiran Chen , Hai , Li

Discrete optimization is a central problem in artificial intelligence. The optimization of the aggregated cost of a network of cost functions arises in a variety of problems including (W)CSP, DCOP, as well as optimization in stochastic…

Artificial Intelligence · Computer Science 2018-01-12 Ferdinando Fioretto , Enrico Pontelli , William Yeoh , Rina Dechter

Matrix Factorization (MF) on large scale data takes substantial time on a Central Processing Unit (CPU). While Graphical Processing Unit (GPU)s could expedite the computation of MF, the available memory on a GPU is finite. Leveraging GPUs…

Machine Learning · Computer Science 2023-04-28 Prasad Bhavana , Vineet Padmanabhan

Caching and multicasting at base stations are two promising approaches to support massive content delivery over wireless networks. However, existing scheduling designs do not make full use of the advantages of the two approaches. In this…

Information Theory · Computer Science 2016-02-25 Bo Zhou , Ying Cui , Meixia Tao

Mixture-of-Experts (MoE) models facilitate edge deployment by decoupling model capacity from active computation, yet their large memory footprint drives the need for GPU systems with near-data processing (NDP) capabilities that offload…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-08 Qi Wu , Chao Fang , Jiayuan Chen , Ye Lin , Yueqi Zhang , Yichuan Bai , Yuan Du , Li Du

In mobile edge computing, local edge servers can host cloud-based services, which reduces network overhead and latency but requires service migrations as users move to new locations. It is challenging to make migration decisions optimally…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-10 Shiqiang Wang , Rahul Urgaonkar , Murtaza Zafer , Ting He , Kevin Chan , Kin K. Leung

Battery-less Internet of Things (IoT) devices rely on ambient energy harvesting and therefore require scheduling policies that jointly account for energy intermittency and hard timing constraints. This challenge is especially acute in…

Systems and Control · Electrical Eng. & Systems 2026-05-20 Shahab Jahanbazi , Mateen Ashraf , Onel L. A. López

We study the offline data-driven sequential decision making problem in the framework of Markov decision process (MDP). In order to enhance the generalizability and adaptivity of the learned policy, we propose to evaluate each policy by a…

Statistics Theory · Mathematics 2021-11-11 Zhengling Qi , Peng Liao

Dynamic programming (DP) is a cornerstone of combinatorial optimization, yet its inherently sequential structure has long limited its scalability in scenario-based stochastic programming (SP). This paper introduces a GPU-accelerated…

Optimization and Control · Mathematics 2025-11-25 Jingyi Zhao , Linxin Yang , Haohua Zhang , Tian Ding

One efficient approach to control chip-wide thermal distribution in multi-core systems is the optimization of online assignments of tasks to processing cores. Online task assignment, however, faces several uncertainties in real-world…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-08 Farnaz Niknia , Kiamehr Rezaee , Vesal Hakami

In this paper, we investigate the scheduling design of a mobile edge computing (MEC) system, where active mobile devices with computation tasks randomly appear in a cell. Every task can be computed at either the mobile device or the MEC…

Information Theory · Computer Science 2020-04-17 Shanfeng Huang , Bojie Lv , Rui Wang , Kaibin Huang

Wireless networks used for Internet of Things (IoT) are expected to largely involve cloud-based computing and processing. Softwarised and centralised signal processing and network switching in the cloud enables flexible network control and…

Artificial Intelligence · Computer Science 2020-10-13 Beiran Chen , Yi Zhang , George Iosifidis , Mingming Liu

Deployment of real-time ML services on warehouse-scale infrastructures is on the increase. Therefore, decreasing latency and increasing throughput of deep neural network (DNN) inference applications that empower those services have…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-29 Seyed Morteza Nabavinejad , Masoumeh Ebrahimi , Sherief Reda

In this paper, we show how a simulated Markov decision process (MDP) built by the so-called \emph{baseline} policies, can be used to compute a different policy, namely the \emph{simulated optimal} policy, for which the performance of this…

Optimization and Control · Mathematics 2014-10-13 Yinlam Chow , Mohammad Ghavamzadeh
‹ Prev 1 2 3 10 Next ›