Related papers: Diagnosing Distributed Systems through Log Data An…
Distributed databases, as the core infrastructure software for internet applications, play a critical role in modern cloud services. However, existing distributed databases frequently experience system failures and performance degradation,…
Distributed locking mechanisms are fundamental to ensuring data consistency and integrity in distributed systems. This paper presents a comprehensive analysis of distributed locking algorithms, focusing on their performance characteristics…
Complex systems often exhibit unexpected faults that are difficult to handle. Such systems are desirable to be diagnosable, i.e. faults can be automatically detected as they occur (or shortly afterwards), enabling the system to handle the…
The evolution of distributed architectures and programming paradigms for performance-oriented program development, challenge the state-of-the-art technology for performance tools. The area of high performance computing is rapidly expanding…
Process mining has gained traction over the past decade and an impressive body of research has resulted in the introduction of a variety of process mining approaches measuring process performance. Having this set of techniques available,…
Traditionally, research in Business Process Management has put a strong focus on centralized and intra-organizational processes. However, today's business processes are increasingly distributed, deviating from a centralized layout, and…
The parallel and distributed processing are becoming de facto industry standard, and a large part of the current research is targeted on how to make computing scalable and distributed, dynamically, without allocating the resources on…
The overall performance of a distributed system is highly dependent on the communication efficiency of the system. Although network resources (links, bandwidth) are becoming increasingly more available, the communication performance of data…
System performance for networks composed of interconnected subsystems can be increased if the traditionally separated subsystems are jointly optimized. Recently, parallel and distributed optimization methods have emerged as a powerful tool…
In this paper we describe our work on designing a web based, distributed data analysis system based on the popular MapReduce framework deployed on a small cloud; developed specifically for analyzing web server logs. The log analysis system…
Various performance characteristics of distributed file systems have been well studied. However, the performance efficiency of distributed file systems on small-file problems with complex machine learning algorithms scenarios is not well…
More and more distributed software systems are being developed and deployed today. Like other software, distributed software systems also need very strong quality assurance support. Distributed software is often very large/complex, has…
This work considers the problem of finding analytical expressions for the expected values of dis- tributed computing performance metrics when the underlying communication network has a complex structure. Through active probing tests a real…
DGCC protocol has been shown to achieve good performance on multi-core in-memory system. However, distributed transactions complicate the dependency resolution, and therefore, an effective transaction partitioning strategy is essential to…
Execution logs are a crucial medium as they record runtime information of software systems. Although extensive logs are helpful to provide valuable details to identify the root cause in postmortem analysis in case of a failure, this may…
In this paper, we address the problem of manual debugging, which nowadays remains resource-intensive and in some parts archaic. This problem is especially evident in increasingly complex and distributed software systems. Therefore, our…
To assist IT service developers and operators in managing their increasingly complex service landscapes, there is a growing effort to leverage artificial intelligence in operations. To speed up troubleshooting, log anomaly detection has…
Parallel application I/O performance often does not meet user expectations. Additionally, slight access pattern modifications may lead to significant changes in performance due to complex interactions between hardware and software. These…
Serverless applications can be particularly difficult to troubleshoot, as these applications are often composed of various managed and partly managed services. Faults are often unpredictable and can occur at multiple points, even in simple…
Many organizations routinely analyze large datasets using systems for distributed data-parallel processing and clusters of commodity resources. Yet, users need to configure adequate resources for their data processing jobs. This requires…