Related papers: Sequential File Programming Patterns and Performan…

Reduction in Packet Delay Through the use of Common Buffer over Distributed Buffer in the Routing Node of NOC Architecture

Performance evaluation of the routing node in terms of latency is the characteristics of an efficient design of Buffer in input module. It is intended to study and quantify the behavior of the single packet array design in relation to the…

Hardware Architecture · Computer Science 2013-02-19 Nilesh A. Mohota , Sanjay L. Badjate

Characterizing Deep-Learning I/O Workloads in TensorFlow

The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-10 Steven W. D. Chien , Stefano Markidis , Chaitanya Prasad Sishtla , Luis Santos , Pawel Herman , Sai Narasimhamurthy , Erwin Laure

Demystifying the Performance of Data Transfers in High-Performance Research Networks

High-speed research networks are built to meet the ever-increasing needs of data-intensive distributed workflows. However, data transfers in these networks often fail to attain the promised transfer rates for several reasons, including I/O…

Systems and Control · Electrical Eng. & Systems 2023-08-22 Ehsan Saeedizade , Bing Zhang , Engin Arslan

Sequentializing Parameterized Programs

We exhibit assertion-preserving (reachability preserving) transformations from parameterized concurrent shared-memory programs, under a k-round scheduling of processes, to sequential programs. The salient feature of the sequential program…

Logic in Computer Science · Computer Science 2012-07-19 Salvatore La Torre , P. Madhusudan , Gennaro Parlato

Run Time Approximation of Non-blocking Service Rates for Streaming Systems

Stream processing is a compute paradigm that promises safe and efficient parallelism. Modern big-data problems are often well suited for stream processing's throughput-oriented nature. Realization of efficient stream processing requires…

Performance · Computer Science 2015-04-14 Jonathan C. Beard , Roger D. Chamberlain

Predictive Modeling of I/O Performance for Machine Learning Training Pipelines: A Data-Driven Approach to Storage Optimization

Modern machine learning training is increasingly bottlenecked by data I/O rather than compute. GPUs often sit idle at below 50% utilization waiting for data. This paper presents a machine learning approach to predict I/O performance and…

Performance · Computer Science 2025-12-22 Karthik Prabhakar , Durgamadhab Mishra

Easy Acceleration with Distributed Arrays

High level programming languages and GPU accelerators are powerful enablers for a wide range of applications. Achieving scalable vertical (within a compute node), horizontal (across compute nodes), and temporal (over different generations…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-21 Jeremy Kepner , Chansup Byun , LaToya Anderson , William Arcand , David Bestor , William Bergeron , Alex Bonn , Daniel Burrill , Vijay Gadepally , Ryan Haney , Michael Houle , Matthew Hubbell , Hayden Jananthan , Michael Jones , Piotr Luszczek , Lauren Milechin , Guillermo Morales , Julie Mullen , Andrew Prout , Albert Reuther , Antonio Rosa , Charles Yee , Peter Michaleas

Performance Considerations for Gigabyte per Second Transcontinental Disk-to-Disk File Transfers

Moving data from CERN to Pasadena at a gigabyte per second using the next generation Internet requires good networking and good disk IO. Ten Gbps Ethernet and OC192 links are in place, so now it is simply a matter of programming. This…

Databases · Computer Science 2007-05-23 Peter Kukol , Jim Gray

Exploiting Parallelism Opportunities with Deep Learning Frameworks

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using…

Machine Learning · Computer Science 2020-07-01 Yu Emma Wang , Carole-Jean Wu , Xiaodong Wang , Kim Hazelwood , David Brooks

An Approach to Data Prefetching Using 2-Dimensional Selection Criteria

We propose an approach to data memory prefetching which augments the standard prefetch buffer with selection criteria based on performance and usage pattern of a given instruction. This approach is built on top of a pattern matching based…

Hardware Architecture · Computer Science 2015-05-18 Jean Sung , Sebastian Krupa , Andrew Fishberg , Josef Spjut

A High Performance Memory Database for Web Application Caches

This paper presents the architecture and characteristics of a memory database intended to be used as a cache engine for web applications. Primary goals of this database are speed and efficiency while running on SMP systems with several CPU…

Networking and Internet Architecture · Computer Science 2008-09-23 Ivan Voras , Danko Basch , Mario Zagar

Empirical study of performance of data binding in ASP.NET web applications

Most developers use default properties of ASP.NET server controls when developing web applications. ASP.NET web applications typically employ server controls to provide dynamic web pages, and data-bound server controls to display and…

Software Engineering · Computer Science 2012-01-04 Toni Stojanovski , Marko Vučković , Ivan Velinov

Experimental Analysis of Server-Side Caching for Web Performance

Performance in web applications is a key aspect of user experience and system scalability. Among the different techniques used to improve web application performance, caching has been widely used. While caching has been widely explored in…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-09 Mohammad Umar , Bharat Tripathi

Continuous Performance Benchmarking Framework for ROOT

Foundational software libraries such as ROOT are under intense pressure to avoid software regression, including performance regressions. Continuous performance benchmarking, as a part of continuous integration and other code quality…

Software Engineering · Computer Science 2019-10-02 Oksana Shadura , Vassil Vassilev , Brian Paul Bockelman

Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures: A Machine Learning Based Approach

This article presents an automatic approach to quickly derive a good solution for hardware resource partition and task granularity for task-based parallel applications on heterogeneous many-core architectures. Our approach employs a…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-10 Peng Zhang , Jianbin Fang , Canqun Yang , Chun Huang , Tao Tang , Zheng Wang

Queueing Subject To Action-Dependent Server Performance: Utilization Rate Reduction

We consider a discrete-time system comprising a first-come-first-served queue, a non-preemptive server, and a stationary non-work-conserving scheduler. New tasks enter the queue according to a Bernoulli process with a pre-specified arrival…

Applications · Statistics 2020-08-05 Michael Lin , Nuno C. Martins , Richard J. La

Optimal Rate-Distortion-Leakage Tradeoff for Single-Server Information Retrieval

Private information retrieval protocols guarantee that a user can privately and losslessly retrieve a single file from a database stored across multiple servers. In this work, we propose to simultaneously relax the conditions of perfect…

Information Theory · Computer Science 2022-01-07 Yauhen Yakimenka , Hsuan-Yin Lin , Eirik Rosnes , Jörg Kliewer

Characterizing Synchronous Writes in Stable Memory Devices

Distributed algorithms that operate in the fail-recovery model rely on the state stored in stable memory to guarantee the irreversibility of operations even in the presence of failures. The performance of these algorithms lean heavily on…

Operating Systems · Computer Science 2020-02-19 William B. Mingardi , Gustavo M. D. Vieira

Performance modeling of a distributed file-system

Data centers have become center of big data processing. Most programs running in a data center processes big data. The storage requirements of such programs cannot be fulfilled by a single node in the data center, and hence a distributed…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-28 Sandeep Kumar

Memory-Efficient Performance Monitoring on Programmable Switches with Lean Algorithms

Network performance problems are notoriously difficult to diagnose. Prior profiling systems collect performance statistics by keeping information about each network flow, but maintaining per-flow state is not scalable on…

Data Structures and Algorithms · Computer Science 2019-11-19 Zaoxing Liu , Samson Zhou , Ori Rottenstreich , Vladimir Braverman , Jennifer Rexford