Related papers: A Statistical Approach to Performance Monitoring i…
In recent IoT (Internet of Things) and Web 2.0 technologies, a critical problem arises with respect to storing and processing the large amount of collected data. In this paper we develop and evaluate distributed infrastructures for storing…
In this paper, we discuss the feasibility of monitoring partially synchronous distributed systems to detect latent bugs, i.e., errors caused by concurrency and race conditions among concurrent processes. We present a monitoring framework…
For large-scale industrial processes under closed-loop control, process dynamics directly resulting from control action are typical characteristics and may show different behaviors between real faults and normal changes of operating…
Identifying root causes for unexpected or undesirable behavior in complex systems is a prevalent challenge. This issue becomes especially crucial in modern cloud applications that employ numerous microservices. Although the machine learning…
We study the problem of monitoring distributed systems where computers communicate using message passing and share an almost synchronized clock. This is a realistic scenario for networks where the speed of the monitoring is sufficiently…
Monitoring software systems at runtime is key for understanding workloads, debugging, and self-adaptation. It typically involves collecting and storing observable software data, which can be analyzed online or offline. Despite the…
High-speed research networks are built to meet the ever-increasing needs of data-intensive distributed workflows. However, data transfers in these networks often fail to attain the promised transfer rates for several reasons, including I/O…
Today's distributed systems operate in complex environments that inevitably involve faults and even adversarial behaviors. Predicting their performance under such environments directly from formal designs remains a longstanding challenge.…
In this paper, we address distributed convergence to fair allocations of CPU resources for time-sensitive applications. We propose a novel resource management framework where a centralized objective for fair allocations is decomposed into a…
We consider a network of smart sensors for an edge computing application that sample a time-varying signal and send updates to a base station for remote global monitoring. Sensors are equipped with sensing and compute, and can either send…
More and more distributed software systems are being developed and deployed today. Like other software, distributed software systems also need very strong quality assurance support. Distributed software is often very large/complex, has…
Managing the transactions in real time distributed computing system is not easy, as it has heterogeneously networked computers to solve a single problem. If a transaction runs across some different sites, it may commit at some sites and may…
Modern real-time systems require accurate characterization of task timing behavior to ensure predictable performance, particularly on complex hardware architectures. Existing methods, such as worst-case execution time analysis, often fail…
It remains a challenging problem to tightly estimate the worst case response time of an application in a distributed embedded system, especially when there are dependencies between tasks. We discovered that the state-of-the art techniques…
Consider a network in which $n$ distributed nodes are connected to a single server. Each node continuously observes a data stream consisting of one value per discrete time step. The server has to continuously monitor a given parameter…
Efficient resource allocation and scheduling algorithms are essential for various distributed applications, ranging from wireless networks and cloud computing platforms to autonomous multi-agent systems and swarm robotic networks. However,…
The log-based analysis and trouble-shooting has remained prevalent and commonly used approach for centralized and time-haring systems. However, for parallel and distributed systems where happen-before relations are not directly available…
Self-adaptive software is considered as the most advanced approach and its development attracts a lot of attention. Decentralization is an effective way to design and manage the complexity of modern self-adaptive software systems. However,…
The performance of computer networks relies on how bandwidth is shared among different flows. Fair resource allocation is a challenging problem particularly when the flows evolve over time.To address this issue, bandwidth sharing techniques…
We will present a new general framework for robust and adaptive control that allows for distributed and scalable learning and control of large systems of interconnected linear subsystems. The control method is demonstrated for a linear…