Related papers: Composing Distributed Computations Through Task an…

IMSuite: A Benchmark Suite for Simulating Distributed Algorithms

Considering the diverse nature of real-world distributed applications that makes it hard to identify a representative subset of distributed benchmarks, we focus on their underlying distributed algorithms. We present and characterize a new…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-10-11 Suyash Gupta , V. Krishna Nandivada

DFUSE: Strongly Consistent Write-Back Kernel Caching for Distributed Userspace File Systems

Cloud platforms host thousands of tenants that demand POSIX semantics, high throughput, and rapid evolution from their storage layer. Kernel-native distributed file systems supply raw speed, but their privileged code base couples every…

Operating Systems · Computer Science 2025-10-23 Haoyu Li , Jingkai Fu , Qing Li , Windsor Hsu , Asaf Cidon

Chunks and Tasks: a programming model for parallelization of dynamic algorithms

We propose Chunks and Tasks, a parallel programming model built on abstractions for both data and work. The application programmer specifies how data and work can be split into smaller pieces, chunks and tasks, respectively. The Chunks and…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-07-29 Emanuel H. Rubensson , Elias Rudberg

Distributed Compilation System for High-Speed Software Build Processes

The idle time of personal computers has increased steadily due to the generalization of computer usage and cloud computing. Clustering research aims at utilizing idle computer resources for processing a variable workload on a large number…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-25 Geunsik Lim , Minho Lee , R. J. W. E. Lahaye , Young Ik Eom

DuctTeip: An efficient programming model for distributed task based parallel computing

Current high-performance computer systems used for scientific computing typically combine shared memory computational nodes in a distributed memory environment. Extracting high performance from these complex systems requires tailored…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-14 Afshin Zafari , Elisabeth Larsson , Martin Tillenius

Enhancing iteration performance on distributed task-based workflows

Task-based programming models have proven to be a robust and versatile way to approach development of applications for distributed environments. They provide natural programming patterns with high performance. However, execution on this…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-08 Alex Barcelo , Anna Queralt , Toni Cortes

Distributed Computations with Layered Resolution

Modern computationally-heavy applications are often time-sensitive, demanding distributed strategies to accelerate them. On the other hand, distributed computing suffers from the bottleneck of slow workers in practice. Distributed coded…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-03 Homa Esfahanizadeh , Alejandro Cohen , Muriel Médard , Shlomo Shamai

Composing Loop-carried Dependence with Other Loops

Sparse fusion is a compile-time loop transformation and runtime scheduling implemented as a domain-specific code generator. Sparse fusion generates efficient parallel code for the combination of two sparse matrix kernels where at least one…

Programming Languages · Computer Science 2021-11-25 Kazem Cheshmi , Michelle Mills Strout , Maryam Mehri Dehnavi

Design and Implementation of a Distributed Middleware for Parallel Execution of Legacy Enterprise Applications

A typical enterprise uses a local area network of computers to perform its business. During the off-working hours, the computational capacities of these networked computers are underused or unused. In order to utilize this computational…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-08-21 Que Thu Dung Nguyen

Analysis of Distributed Algorithms for Big-data

The parallel and distributed processing are becoming de facto industry standard, and a large part of the current research is targeted on how to make computing scalable and distributed, dynamically, without allocating the resources on…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-10 Rajendra Purohit , K R Chowdhary , S D Purohit

FusionStitching: Boosting Memory Intensive Computations for Deep Learning Workloads

We show in this work that memory intensive computations can result in severe performance problems due to off-chip memory access and CPU-GPU context switch overheads in a wide range of deep learning models. For this problem, current…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-20 Zhen Zheng , Pengzhan Zhao , Guoping Long , Feiwen Zhu , Kai Zhu , Wenyi Zhao , Lansong Diao , Jun Yang , Wei Lin

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Muyang Li , Tianle Cai , Jiaxin Cao , Qinsheng Zhang , Han Cai , Junjie Bai , Yangqing Jia , Ming-Yu Liu , Kai Li , Song Han

Hyper: Distributed Cloud Processing for Large-Scale Deep Learning Tasks

Training and deploying deep learning models in real-world applications require processing large amounts of data. This is a challenging task when the amount of data grows to a hundred terabytes, or even, petabyte-scale. We introduce a hybrid…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-17 Davit Buniatyan

PULSE: Accelerating Distributed Pointer-Traversals on Disaggregated Memory (Extended Version)

Caches at CPU nodes in disaggregated memory architectures amortize the high data access latency over the network. However, such caches are fundamentally unable to improve performance for workloads requiring pointer traversals across linked…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-17 Yupeng Tang , Seung-seob Lee , Abhishek Bhattacharjee , Anurag Khandelwal

Resilient Distributed Diffusion for Multi-task Estimation

Distributed diffusion is a powerful algorithm for multi-task state estimation which enables networked agents to interact with neighbors to process input data and diffuse information across the network. Compared to a centralized approach,…

Multiagent Systems · Computer Science 2020-03-27 Jiani Li , Xenofon Koutsoukos

Resolvable Designs for Speeding up Distributed Computing

Distributed computing frameworks such as MapReduce are often used to process large computational jobs. They operate by partitioning each job into smaller tasks executed on different servers. The servers also need to exchange intermediate…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-20 Konstantinos Konstantinidis , Aditya Ramamoorthy

The Dynamical Kernel Scheduler - Part 1

Emerging processor architectures such as GPUs and Intel MICs provide a huge performance potential for high performance computing. However developing software using these hardware accelerators introduces additional challenges for the…

Computational Physics · Physics 2016-09-21 Andreas Adelmann , Uldis Locans , Andreas Suter

Curbing Task Interference using Representation Similarity-Guided Multi-Task Feature Sharing

Multi-task learning of dense prediction tasks, by sharing both the encoder and decoder, as opposed to sharing only the encoder, provides an attractive front to increase both accuracy and computational efficiency. When the tasks are similar,…

Computer Vision and Pattern Recognition · Computer Science 2022-08-22 Naresh Kumar Gurulingan , Elahe Arani , Bahram Zonooz

Stream Iterative Distributed Coded Computing for Learning Applications in Heterogeneous Systems

To improve the utility of learning applications and render machine learning solutions feasible for complex applications, a substantial amount of heavy computations is needed. Thus, it is essential to delegate the computations among several…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-29 Homa Esfahanizadeh , Alejandro Cohen , Muriel Medard

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

Large deep learning models have demonstrated strong ability to solve many tasks across a wide range of applications. Those large models typically require training and inference to be distributed. Tensor parallelism is a common technique…

Machine Learning · Computer Science 2024-10-25 Li-Wen Chang , Wenlei Bao , Qi Hou , Chengquan Jiang , Ningxin Zheng , Yinmin Zhong , Xuanrun Zhang , Zuquan Song , Chengji Yao , Ziheng Jiang , Haibin Lin , Xin Jin , Xin Liu