English
Related papers

Related papers: Tracking System Behaviour from Resource Usage Data

200 papers

High performance computing (HPC) facilities consist of a large number of interconnected computing units (or nodes) that execute highly complex scientific simulations to support scientific research. Monitoring such facilities, in real-time,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-28 Niyazi Sorkunlu , Duc Thanh Anh Luong , Varun Chandola

High-performance computing (HPC) systems are a complex combination of software, processors, memory, networks, and storage systems characterized by frequent disruptive technological advances. Anomalous behavior has to be manually diagnosed…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-19 Charng-Da Lu

Detecting and resolving performance anomalies in Cloud services is crucial for maintaining desired performance objectives. Scaling actions triggered by an anomaly detector help achieve target latency at the cost of extra resource…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-24 Gabriel Job Antunes Grabher , Fumio Machida , Thomas Ropars

This paper provides an overview of three notable approaches for detecting anomalies in spatio-temporal data. The three review methods are selected from the framework of multivariate statistical process control (SPC), scan statistics, and…

Methodology · Statistics 2023-09-19 Ji Chen

Software performance modeling plays a crucial role in developing and maintaining software systems. A performance model analytically describes the relationship between the performance of a system and its runtime activities. This process…

Software Engineering · Computer Science 2024-11-27 Kaveh Shahedi , Heng Li , Maxime Lamothe , Foutse Khomh

As contemporary software-intensive systems reach increasingly large scale, it is imperative that failure detection schemes be developed to help prevent costly system downtimes. A promising direction towards the construction of such schemes…

Applications · Statistics 2016-09-27 Alexey Artemov , Evgeny Burnaev

The ability to understand how a scientific application is executed on a large HPC system is of great importance in allocating resources within the HPC data center. In this paper, we describe how we used system performance data to identify:…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-15 David Brayford , Christoph Bernau , Wolfram Hesse , Carla Guillen

Spatiotemporal traffic time series, such as traffic speed data, collected from sensing systems are often incomplete, with considerable corruption and large amounts of missing values. A vast amount of data conceals implicit data structures,…

Optimization and Control · Mathematics 2025-04-04 Junxi Man , Yumin Lin , Xiaoyu Li

Failure detection in telecommunication networks is a vital task. So far, several supervised and unsupervised solutions have been provided for discovering failures in such networks. Among them unsupervised approaches has attracted more…

Artificial Intelligence · Computer Science 2014-06-13 Hadi Fanaee-T , Márcia D. B. Oliveira , João Gama , Simon Malinowski , Ricardo Morla

Anomaly detection in spatiotemporal data is a challenging problem encountered in a variety of applications, including video surveillance, medical imaging data, and urban traffic monitoring. Existing anomaly detection methods focus mainly on…

Machine Learning · Computer Science 2025-10-02 Rachita Mondal , Mert Indibi , Tapabrata Maiti , Selin Aviyente

Reliability is a cumbersome problem in High Performance Computing Systems and Data Centers evolution. During operation, several types of fault conditions or anomalies can arise, ranging from malfunctioning hardware to improper…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-30 Andrea Borghesi , Antonio Libri , Luca Benini , Andrea Bartolini

Spatiotemporal traffic data (e.g., link speed/flow) collected from sensor networks can be organized as multivariate time series with additional spatial attributes. A crucial task in analyzing such data is to identify and detect anomalous…

Machine Learning · Computer Science 2021-10-12 Xudong Wang , Luis Miranda-Moreno , Lijun Sun

While detailed resource usage monitoring is possible on the low-level using proper tools, associating such usage with higher-level abstractions in the application layer that actually cause the resource usage in the first place presents a…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-02 Joel Witzke , Ansgar Lößer , Vasilis Bountris , Florian Schintke , Björn Scheuermann

Event detection is gaining increasing attention in smart cities research. Large-scale mobility data serves as an important tool to uncover the dynamics of urban transportation systems, and more often than not the dataset is incomplete. In…

Signal Processing · Electrical Eng. & Systems 2019-08-28 Yue Hu , Dan Work

Energy efficiency is one of the major concern in designing advanced computing infrastructures. From single nodes to large-scale systems (data centers), monitoring the energy consumption of the computing system when applications run is a…

Understanding the behavior of software in execution is a key step in identifying and fixing performance issues. This is especially important in high performance computing contexts where even minor performance tweaks can translate into large…

Most enterprise applications use logging as a mechanism to diagnose anomalies, which could help with reducing system downtime. Anomaly detection using software execution logs has been explored in several prior studies, using both classical…

Machine Learning · Computer Science 2023-11-01 Nadun Wijesinghe , Hadi Hemmati

Tensor completion is an extension of matrix completion aimed at recovering a multiway data tensor by leveraging a given subset of its entries (observations) and the pattern of observation. The low-rank assumption is key in establishing a…

Numerical Analysis · Mathematics 2026-03-12 Shakir Showkat Sofi , Lieven De Lathauwer

Complex networks have now become integral parts of modern information infrastructures. This paper proposes a user-centric method for detecting anomalies in heterogeneous information networks, in which nodes and/or edges might be from…

Social and Information Networks · Computer Science 2018-10-22 Vahid Ranjbar , Mostafa Salehi , Pegah Jandaghi , Mahdi Jalili

The complexity and ubiquity of modern computing systems is a fertile ground for anomalies, including security and privacy breaches. In this paper, we propose a new methodology that addresses the practical challenges to implement anomaly…

Cryptography and Security · Computer Science 2020-06-17 Charles F. Gonçalves , Daniel S. Menasché , Alberto Avritzer , Nuno Antunes , Marco Vieira
‹ Prev 1 2 3 10 Next ›