Related papers: Mining Java Memory Errors using Subjective Interes…
Deriving insights from high-dimensional data is one of the core problems in data mining. The difficulty mainly stems from the fact that there are exponentially many variable combinations to potentially consider, and there are infinitely…
The task of subgroup discovery (SD) is to find interpretable descriptions of subsets of a dataset that stand out with respect to a target attribute. To address the problem of mining large numbers of redundant subgroups, subgroup set…
Active Learning is a very common yet powerful framework for iteratively and adaptively sampling subsets of the unlabeled sets with a human in the loop with the goal of achieving labeling efficiency. Most real world datasets have imbalance…
Software vulnerabilities pose significant risks to computer systems, impacting our daily lives, productivity, and even our health. Identifying and addressing security vulnerabilities in a timely manner is crucial to prevent hacking and data…
Java is the "go-to" programming language choice for developing scalable enterprise cloud applications. In such systems, even a few percent CPU time savings can offer a significant competitive advantage and cost saving. Although performance…
Data mining is the task of discovering interesting patterns from large amounts of data. There are many data mining tasks, such as classification, clustering, association rule mining, and sequential pattern mining. Sequential pattern mining…
Intention-oriented process mining is based on the belief that the fundamental nature of processes is mostly intentional (unlike activity-oriented process) and aims at discovering strategy and intentional process models from event-logs…
Pattern discovery is a machine learning technique that aims to find sets of items, subsequences, or substructures that are present in a dataset with a higher frequency value than a manually set threshold. This process helps to identify…
This thesis focuses on process mining on event data where such a normative specification is absent and, as a result, the event data is less structured. The thesis puts special emphasis on one application domain that fits this description:…
Discovering statistically significant patterns from databases is an important challenging problem. The main obstacle of this problem is in the difficulty of taking into account the selection bias, i.e., the bias arising from the fact that…
Many performance inefficiencies such as inappropriate choice of algorithms or data structures, developers' inattention to performance, and missed compiler optimizations show up as wasteful memory operations. Wasteful memory operations are…
Process discovery algorithms learn process models from executed activity sequences, describing concurrency, causality, and conflict. Concurrent activities require observing multiple permutations, increasing data requirements, especially for…
This paper develops a memory-efficient approach for Sequential Pattern Mining (SPM), a fundamental topic in knowledge discovery that faces a well-known memory bottleneck for large data sets. Our methodology involves a novel hybrid trie data…
Process mining is a field of computer science that deals with discovery and analysis of process models based on automatically generated event logs. Currently, many companies use this technology for optimization and improving their…
Differential performance debugging is a technique to find performance problems. It applies in situations where the performance of a program is (unexpectedly) different for different classes of inputs. The task is to explain the differences…
The genuine supervision of modern IT systems brings new challenges as it requires higher standards of scalability, reliability and efficiency when analysing and monitoring big data streams. Rule-based inference engines are a key component…
Now a days, data mining and knowledge discovery methods are applied to a variety of enterprise and engineering disciplines to uncover interesting patterns from databases. The study of Sequential patterns is an important data mining problem…
It is a growing direction to utilize unintended memorization in ML models to benefit real-world applications, with recent efforts like user auditing, dataset ownership inference and forgotten data measurement. Standing on the point of ML…
Context: Specification mining techniques are typically used to extract the specification of a software in the absence of (up-to-date) specification documents. This is useful for program comprehension, testing, and anomaly detection.…
Event detection in time series is a challenging task due to the prevalence of imbalanced datasets, rare events, and time interval-defined events. Traditional supervised deep learning methods primarily employ binary classification, where…