Related papers: Sequential File Programming Patterns and Performan…
Performance evaluation of the routing node in terms of latency is the characteristics of an efficient design of Buffer in input module. It is intended to study and quantify the behavior of the single packet array design in relation to the…
The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on…
High-speed research networks are built to meet the ever-increasing needs of data-intensive distributed workflows. However, data transfers in these networks often fail to attain the promised transfer rates for several reasons, including I/O…
We exhibit assertion-preserving (reachability preserving) transformations from parameterized concurrent shared-memory programs, under a k-round scheduling of processes, to sequential programs. The salient feature of the sequential program…
Stream processing is a compute paradigm that promises safe and efficient parallelism. Modern big-data problems are often well suited for stream processing's throughput-oriented nature. Realization of efficient stream processing requires…
Modern machine learning training is increasingly bottlenecked by data I/O rather than compute. GPUs often sit idle at below 50% utilization waiting for data. This paper presents a machine learning approach to predict I/O performance and…
High level programming languages and GPU accelerators are powerful enablers for a wide range of applications. Achieving scalable vertical (within a compute node), horizontal (across compute nodes), and temporal (over different generations…
Moving data from CERN to Pasadena at a gigabyte per second using the next generation Internet requires good networking and good disk IO. Ten Gbps Ethernet and OC192 links are in place, so now it is simply a matter of programming. This…
State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using…
We propose an approach to data memory prefetching which augments the standard prefetch buffer with selection criteria based on performance and usage pattern of a given instruction. This approach is built on top of a pattern matching based…
This paper presents the architecture and characteristics of a memory database intended to be used as a cache engine for web applications. Primary goals of this database are speed and efficiency while running on SMP systems with several CPU…
Most developers use default properties of ASP.NET server controls when developing web applications. ASP.NET web applications typically employ server controls to provide dynamic web pages, and data-bound server controls to display and…
Performance in web applications is a key aspect of user experience and system scalability. Among the different techniques used to improve web application performance, caching has been widely used. While caching has been widely explored in…
Foundational software libraries such as ROOT are under intense pressure to avoid software regression, including performance regressions. Continuous performance benchmarking, as a part of continuous integration and other code quality…
This article presents an automatic approach to quickly derive a good solution for hardware resource partition and task granularity for task-based parallel applications on heterogeneous many-core architectures. Our approach employs a…
We consider a discrete-time system comprising a first-come-first-served queue, a non-preemptive server, and a stationary non-work-conserving scheduler. New tasks enter the queue according to a Bernoulli process with a pre-specified arrival…
Private information retrieval protocols guarantee that a user can privately and losslessly retrieve a single file from a database stored across multiple servers. In this work, we propose to simultaneously relax the conditions of perfect…
Distributed algorithms that operate in the fail-recovery model rely on the state stored in stable memory to guarantee the irreversibility of operations even in the presence of failures. The performance of these algorithms lean heavily on…
Data centers have become center of big data processing. Most programs running in a data center processes big data. The storage requirements of such programs cannot be fulfilled by a single node in the data center, and hence a distributed…
Network performance problems are notoriously difficult to diagnose. Prior profiling systems collect performance statistics by keeping information about each network flow, but maintaining per-flow state is not scalable on…