English
Related papers

Related papers: Accelerating Mobile Inference through Fine-Grained…

200 papers

Large Language Models (LLMs) have achieved impressive results across various tasks, yet their high computational demands pose deployment challenges, especially on consumer-grade hardware. Mixture of Experts (MoE) models provide an efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-19 En-Ming Huang , Li-Shang Lin , Chun-Yi Lee

There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the…

Machine Learning · Computer Science 2024-05-06 Sicong Liu , Wentao Zhou , Zimu Zhou , Bin Guo , Minfan Wang , Cheng Fang , Zheng Lin , Zhiwen Yu

On-device inference of machine learning models for mobile phones is desirable due to its lower latency and increased privacy. Running such a compute-intensive task solely on the mobile CPU, however, can be difficult due to limited computing…

Modern mobile devices are equipped with high-performance hardware resources such as graphics processing units (GPUs), making the end-side intelligent services more feasible. Even recently, specialized silicons as neural engines are being…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-04 Amir Erfan Eshratifar , Amirhossein Esmaili , Massoud Pedram

Discrete optimization is a central problem in artificial intelligence. The optimization of the aggregated cost of a network of cost functions arises in a variety of problems including (W)CSP, DCOP, as well as optimization in stochastic…

Artificial Intelligence · Computer Science 2018-01-12 Ferdinando Fioretto , Enrico Pontelli , William Yeoh , Rina Dechter

Deploying deep learning models in cloud clusters provides efficient and prompt inference services to accommodate the widespread application of deep learning. These clusters are usually equipped with host CPUs and accelerators with distinct…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-24 Zinuo Cai , Hao Wang , Tao Song , Yang Hua , Ruhui Ma , Haibing Guan

Deep Learning (DL) has shown impressive performance in many mobile applications. Most existing works have focused on reducing the computational and resource overheads of running Deep Neural Networks (DNN) inference on resource-constrained…

Machine Learning · Computer Science 2022-02-22 Anish Das , Young D. Kwon , Jagmohan Chauhan , Cecilia Mascolo

The ever-increasing demand from mobile Machine Learning (ML) applications calls for evermore powerful on-chip computing resources. Mobile devices are empowered with heterogeneous multi-processor Systems-on-Chips (SoCs) to process ML…

Machine Learning · Computer Science 2021-02-03 Siqi Wang , Anuj Pathania , Tulika Mitra

The common assumption in on-device AI is that GPUs, with their superior parallel processing, always provide the best performance for large language model (LLM) inference. In this work, we challenge this notion by empirically demonstrating…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-13 Haolin Zhang , Jeff Huang

In this paper, we present an OpenCL-based heterogeneous implementation of a computer vision algorithm -- image inpainting-based object removal algorithm -- on mobile devices. To take advantage of the computation power of the mobile…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-03-19 Guohui Wang , Yingen Xiong , Jay Yun , Joseph R. Cavallaro

Deploying large language models (LLMs) for online inference is often constrained by limited GPU memory, particularly due to the growing KV cache during auto-regressive decoding. Hybrid GPU-CPU execution has emerged as a promising solution…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-16 Jiakun Fan , Yanglin Zhang , Xiangchen Li , Dimitrios S. Nikolopoulos

With the rapid advancement of artificial intelligence technologies such as ChatGPT, AI agents, and video generation, contemporary mobile systems have begun integrating these AI capabilities on local devices to enhance privacy and reduce…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-07 Le Chen , Dahu Feng , Erhu Feng , Yingrui Wang , Rong Zhao , Yubin Xia , Pinjie Xu , Haibo Chen

Deploying deep neural networks (DNNs) on resource-constrained mobile devices presents significant challenges, particularly in achieving real-time performance while simultaneously coping with limited computational resources and battery life.…

Networking and Internet Architecture · Computer Science 2025-09-24 Zekai Sun , Xiuxian Guan , Zheng Lin , Zihan Fang , Xiangming Cai , Zhe Chen , Fangming Liu , Heming Cui , Jie Xiong , Wei Ni , Chau Yuen

Device-edge co-inference, which partitions a deep neural network between a resource-constrained mobile device and an edge server, recently emerges as a promising paradigm to support intelligent mobile applications. To accelerate the…

Machine Learning · Computer Science 2021-09-01 Xinjie Zhang , Jiawei Shao , Yuyi Mao , Jun Zhang

Modern mobile applications are benefiting significantly from the advancement in deep learning, e.g., implementing real-time image recognition and conversational system. Given a trained deep learning model, applications usually need to…

Performance · Computer Science 2019-03-01 Tian Guo

Large language models (LLMs) have shown exceptional performance and vast potential across diverse tasks. However, the deployment of LLMs with high performance in low-resource environments has garnered significant attention in the industry.…

Artificial Intelligence · Computer Science 2024-07-11 Pujiang He , Shan Zhou , Wenhuan Huang , Changqing Li , Duyi Wang , Bin Guo , Chen Meng , Sheng Gui , Weifei Yu , Yi Xie

Parallel programming models can encourage performance portability by moving the responsibility for work assignment and data distribution from the programmer to a runtime system. However, analyzing the resulting implicit memory allocations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-14 Fabian Knorr , Philip Salzmann , Peter Thoman , Thomas Fahringer

Deep Neural Networks are allowing mobile devices to incorporate a wide range of features into user applications. However, the computational complexity of these models makes it difficult to run them effectively on resource-constrained mobile…

Performance · Computer Science 2020-04-02 Samuel S. Ogden , Tian Guo

Current computational systems are heterogeneous by nature, featuring a combination of CPUs and GPUs. As the latter are becoming an established platform for high-performance computing, the focus is shifting towards the seamless programming…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-23 Fábio Soldado , Fernando Alexandre , Hervé Paulino

Characterizing and predicting the training performance of modern machine learning (ML) workloads on compute systems with compute and communication spread between CPUs, GPUs, and network devices is not only the key to optimization and…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-27 Zhongyi Lin , Ning Sun , Pallab Bhattacharya , Xizhou Feng , Louis Feng , John D. Owens
‹ Prev 1 2 3 10 Next ›