English
Related papers

Related papers: Two-Chains: High Performance Framework for Functio…

200 papers

Network library APIs have historically been developed with the emphasis on data movement, placement, and communication semantics. Many communication semantics are available across a large variety of network libraries, such as send-receive,…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-02 Luis E. Peña , Wenbin Lu , Pavel Shamis , Steve Poole

This work describes the design, implementation and performance analysis of a distributed two-tiered storage software. The first tier functions as a distributed software cache implemented using solid-state devices~(NVMes) and the second tier…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-13 Aparna Sasidharan , Xian-He , Jay Lofstead , Scott Klasky

The recent advancements in multicore machines highlight the need to simplify concurrent programming in order to leverage their computational power. One way to achieve this is by designing efficient concurrent data structures (e.g. stacks,…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-31 Nikolaos D. Kallimanis

There has been a significant amount of work in the literature proposing semantic relaxation of concurrent data structures for improving scalability and performance. By relaxing the semantics of a data structure, a bigger design space, that…

Data Structures and Algorithms · Computer Science 2025-11-11 Adones Rukundo , Aras Atalar , Philippas Tsigas

In this paper, we present a framework for moving compute and data between processing elements in a distributed heterogeneous system. The implementation of the framework is based on the LLVM compiler toolchain combined with the UCX…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-13 Wenbin Lu , Luis E. Peña , Pavel Shamis , Valentin Churavy , Barbara Chapman , Steve Poole

Binarized Neural Networks (BNNs) significantly reduce the computation and memory demands with binarized weights and activations compared to full-precision NNs. Executing a layer in a BNN on different devices of a heterogeneous…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-13 Leonard David Bereholschi , Ching-Chi Lin , Mikail Yayla , Jian-Jia Chen

Efficient parallelization of Large Language Models (LLMs) with long sequences is essential but challenging due to their significant computational and memory demands, particularly stemming from communication bottlenecks in attention…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-31 Zongwu Wang , Fangxin Liu , Mingshuai Li , Li Jiang

We present a systematic derivation of a data-parallel implementation of two-level, static and collision-free hash maps, by giving a functional formulation of the Fredman et al. construction, and then flattening it. We discuss the challenges…

Programming Languages · Computer Science 2025-08-18 William Henrich Due , Martin Elsman , Troels Henriksen

Motivated by the need for adaptive, secure and responsive scheduling in a great range of computing applications, including human-centered and time-critical applications, this paper proposes a scheduling framework that seamlessly adds…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-14 Georgios C. Chasparis , Vladimir Janjic , Michael Rossbory

When some application scenarios need to use semantic segmentation technology, like automatic driving, the primary concern comes to real-time performance rather than extremely high segmentation accuracy. To achieve a good trade-off between…

Computer Vision and Pattern Recognition · Computer Science 2023-11-01 Liang Liao , Liang Wan , Mingsheng Liu , Shusheng Li

We propose Chunks and Tasks, a parallel programming model built on abstractions for both data and work. The application programmer specifies how data and work can be split into smaller pieces, chunks and tasks, respectively. The Chunks and…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-07-29 Emanuel H. Rubensson , Elias Rudberg

In LLM serving, reusing the KV cache of prompts across requests is critical for reducing TTFT and serving costs. Cache-affinity scheduling, which co-locates requests with the same prompt prefix to maximize KV cache reuse, often conflicts…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-09 Ying Yuan , Pengfei Zuo , Bo Wang , Zhangyu Chen , Zhipeng Tan , Zhou Yu

Binary2source function matching is a fundamental task for many security applications, including Software Component Analysis (SCA). The "1-to-1" mechanism has been applied in existing binary2source matching works, in which one binary…

Software Engineering · Computer Science 2022-10-28 Ang Jia , Ming Fan , Xi Xu , Wuxia Jin , Haijun Wang , Qiyi Tang , Sen Nie , Shi Wu , Ting Liu

Data compression is a well-studied (and well-solved) problem in the setup of long coding blocks. But important emerging applications need to compress data to memory words of small fixed widths. This new setup is the subject of this paper.…

Information Theory · Computer Science 2017-01-12 Ori Rottenstreich , Yuval Cassuto

High Performance Computing is notorious for its long and expensive software development cycle. To address this challenge, we present Bind: a "partitioned global workflow" parallel programming model for C++ applications that enables quick…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-16 Alex Kosenkov , Matthias Troyer

Modern high performance computing (HPC) systems exhibit a rapid growth in size, both "horizontally" in the number of nodes, as well as "vertically" in the number of cores per node. As such, they offer additional levels of hardware…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-06 Ahmed Eleliemy , Ali Mohammed , Florina M. Ciorba

High-performance clusters and datacenters pose increasingly demanding requirements on storage systems. If these systems do not operate at scale, applications are doomed to become I/O bound and waste compute cycles. To accelerate the data…

Networking and Internet Architecture · Computer Science 2022-06-22 Salvatore Di Girolamo , Daniele De Sensi , Konstantin Taranov , Milos Malesevic , Maciej Besta , Timo Schneider , Severin Kistler , Torsten Hoefler

This work elaborates on a High performance computing (HPC) architecture based on Simple Linux Utility for Resource Management (SLURM) [1] for deploying heterogeneous Large Language Models (LLMs) into a scalable inference engine. Dynamic…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-26 Anderson de Lima Luiz , Shubham Vijay Kurlekar , Munir Georges

Graph neural networks (GNNs) have emerged as a promising solution to deal with unstructured data, outperforming traditional deep learning architectures. However, most of the current GNN models are designed to work with a single graph, which…

Machine Learning · Computer Science 2024-11-11 Victor M. Tenorio , Antonio G. Marques

As mathematical computing becomes more democratized in high-level languages, high-performance symbolic-numeric systems are necessary for domain scientists and engineers to get the best performance out of their machine without deep knowledge…

Computation and Language · Computer Science 2022-02-08 Shashi Gowda , Yingbo Ma , Alessandro Cheli , Maja Gwozdz , Viral B. Shah , Alan Edelman , Christopher Rackauckas
‹ Prev 1 2 3 10 Next ›