Related papers: Distributed Execution Indexing
Performance analysis of microservices can be a challenging task, as a typical request to these systems involves multiple Remote Procedure Calls (RPC) spanning across independent services and machines. Practitioners primarily rely on…
Performance in heterogeneous service-based systems shows non-determistic trends. Even for the same request type, latency may vary from one request to another. These variations can occur due to several reasons on different levels of the…
Symbolic execution is a classic technique for systematic bug finding, which has seen many applications in recent years but remains hard to scale. Recent work introduced ranged symbolic execution to distribute the symbolic execution task…
One of the factors that limits the scale, performance, and sophistication of distributed applications is the difficulty of concurrently executing them on multiple distributed computing resources. In part, this is due to a poor understanding…
Distributed data structures are key to implementing scalable applications for scientific simulations and data analysis. In this paper we look at two implementation styles for distributed data structures: remote direct memory access (RDMA)…
In this paper, we present the design of a scalable, distributed stream processing system for RFID tracking and monitoring. Since RFID data lacks containment and location information that is key to query processing, we propose to combine…
Existing techniques used for intrusion detection do not fully utilize the intrinsic properties of embedded systems. In this paper, we propose a lightweight method for detecting anomalous executions using a distribution of system call…
Distributed tracing has become an essential technique for debugging and troubleshooting modern microservice-based applications, enabling software engineers to detect performance bottlenecks, identify failures, and gain insights into system…
This paper describes a new benchmark tool, Spatter, for assessing memory system architectures in the context of a specific category of indexed accesses known as gather and scatter. These types of operations are increasingly used to express…
Fault-tolerance is critically important in highly-distributed modern cloud applications. Solutions such as Temporal, Azure Durable Functions, and Beldi hide fault-tolerance complexity from developers by persisting execution state and…
Remote Procedure Call (RPC) is a widely used abstraction for cloud computing. The programmer specifies type information for each remote procedure, and a compiler generates stub code linked into each application to marshal and unmarshal…
People have shown that in-network computation (INC) significantly boosts performance in many application scenarios include distributed training, MapReduce, agreement, and network monitoring. However, existing INC programming is unfriendly…
Microservice is a way of splitting the logic of an application into small blocks that can be run on different computing units and used by other applications. It has been successful for cloud applications and is now increasingly used for…
Microservice resilience, the ability of microservices to recover from failures and continue providing reliable and responsive services, is crucial for cloud vendors. However, the current practice relies on manually configured rules specific…
Distributed programs are hard to get right because they are required to be open, scalable, long-running, and tolerant to faults. In particular, the recent approaches to distributed software based on (micro-)services where different services…
Soft error of exascale application is a challenge problem in modern HPC. In order to quantify an application's resilience and vulnerability, the application-level fault injection method is widely adopted by HPC users. However, it is not…
When working at exascale, the various constraints imposed by the extreme scale of the system bring new challenges for application users and software/middleware developers. In that context, and to provide best performance, resiliency and…
Many tools and libraries are readily available to build and operate distributed Web applications. While the setup of operational environments is comparatively easy, practice shows that their continuous secure operation is more difficult to…
Coordination services and protocols are critical components of distributed systems and are essential for providing consistency, fault tolerance, and scalability. However, due to the lack of standard benchmarking and evaluation tools for…
We present HiCR, a model to represent the semantics of distributed heterogeneous applications and runtime systems. The model describes a minimal set of abstract operations to enable hardware topology discovery, kernel execution, memory…