Related papers: Mining Java Memory Errors using Subjective Interes…

Subjectively Interesting Subgroup Discovery on Real-valued Targets

Deriving insights from high-dimensional data is one of the core problems in data mining. The difficulty mainly stems from the fact that there are exponentially many variable combinations to potentially consider, and there are infinitely…

Machine Learning · Statistics 2021-11-08 Jefrey Lijffijt , Bo Kang , Wouter Duivesteijn , Kai Puolamäki , Emilia Oikarinen , Tijl De Bie

Discovering outstanding subgroup lists for numeric targets using MDL

The task of subgroup discovery (SD) is to find interpretable descriptions of subsets of a dataset that stand out with respect to a target attribute. To address the problem of mining large numbers of redundant subgroups, subgroup set…

Machine Learning · Computer Science 2021-03-16 Hugo M. Proença , Peter Grünwald , Thomas Bäck , Matthijs van Leeuwen

Active Data Discovery: Mining Unknown Data using Submodular Information Measures

Active Learning is a very common yet powerful framework for iteratively and adaptively sampling subsets of the unlabeled sets with a human in the loop with the goal of achieving labeling efficiency. Most real world datasets have imbalance…

Computer Vision and Pattern Recognition · Computer Science 2022-06-20 Suraj Kothawade , Shivang Chopra , Saikat Ghosh , Rishabh Iyer

JFinder: A Novel Architecture for Java Vulnerability Identification Based Quad Self-Attention and Pre-training Mechanism

Software vulnerabilities pose significant risks to computer systems, impacting our daily lives, productivity, and even our health. Identifying and addressing security vulnerabilities in a timely manner is crucial to prevent hacking and data…

Cryptography and Security · Computer Science 2023-08-01 Jin Wang , Zishan Huang , Hui Xiao , Yinhao Xiao

DJXPerf: Identifying Memory Inefficiencies via Object-centric Profiling for Java

Java is the "go-to" programming language choice for developing scalable enterprise cloud applications. In such systems, even a few percent CPU time savings can offer a significant competitive advantage and cost saving. Although performance…

Performance · Computer Science 2021-04-09 Bolun Li , Pengfei Su , Milind Chabbi , Shuyin Jiao , Xu Liu

Finding Sequential Patterns from Large Sequence Data

Data mining is the task of discovering interesting patterns from large amounts of data. There are many data mining tasks, such as classification, clustering, association rule mining, and sequential pattern mining. Sequential pattern mining…

Databases · Computer Science 2010-02-08 Mahdi Esmaeili , Fazekas Gabor

Intention-Oriented Process Model Discovery from Incident Management Event Logs

Intention-oriented process mining is based on the belief that the fundamental nature of processes is mostly intentional (unlike activity-oriented process) and aims at discovering strategy and intentional process models from event-logs…

Software Engineering · Computer Science 2015-07-07 Ashish Sureka

A new algorithm for Subgroup Set Discovery based on Information Gain

Pattern discovery is a machine learning technique that aims to find sets of items, subsequences, or substructures that are present in a dataset with a higher frequency value than a manually set threshold. This process helps to identify…

Machine Learning · Computer Science 2023-08-01 Daniel Gómez-Bravo , Aaron García , Guillermo Vigueras , Belén Ríos , Alejandro Rodríguez-González

Mining Insights from Weakly-Structured Event Data

This thesis focuses on process mining on event data where such a normative specification is absent and, as a result, the event data is less structured. The thesis puts special emphasis on one application domain that fits this description:…

Artificial Intelligence · Computer Science 2019-09-05 Niek Tax

Selective Inference Approach for Statistically Sound Predictive Pattern Mining

Discovering statistically significant patterns from databases is an important challenging problem. The main obstacle of this problem is in the difficulty of taking into account the selection bias, i.e., the bias arising from the fact that…

Machine Learning · Statistics 2016-03-10 Shinya Suzumura , Kazuya Nakagawa , Mahito Sugiyama , Koji Tsuda , Ichiro Takeuchi

Pinpointing Performance Inefficiencies in Java

Many performance inefficiencies such as inappropriate choice of algorithms or data structures, developers' inattention to performance, and missed compiler optimizations show up as wasteful memory operations. Wasteful memory operations are…

Performance · Computer Science 2019-07-01 Pengfei Su , Qingsen Wang , Milind Chabbi , Xu Liu

Discovering High-Quality Process Models Despite Data Scarcity

Process discovery algorithms learn process models from executed activity sequences, describing concurrency, causality, and conflict. Concurrent activities require observing multiple permutations, increasing data requirements, especially for…

Databases · Computer Science 2023-10-18 Jan Niklas Adams , Jari Peeperkorn , Tobias Brockhoff , Isabelle Terrier , Heiko Göhner , Merih Seran Uysal , Seppe vanden Broucke , Jochen De Weerdt , Wil M. P. van der Aalst

Memory-Efficient Sequential Pattern Mining with Hybrid Tries

This paper develops a memory-efficient approach for Sequential Pattern Mining (SPM), a fundamental topic in knowledge discovery that faces a well-known memory bottleneck for large data sets. Our methodology involves a novel hybrid trie data…

Databases · Computer Science 2024-07-30 Amin Hosseininasab , Willem-Jan van Hoeve , Andre A. Cire

Discovering Hierarchical Process Models: an Approach Based on Events Clustering

Process mining is a field of computer science that deals with discovery and analysis of process models based on automatically generated event logs. Currently, many companies use this technology for optimization and improving their…

Artificial Intelligence · Computer Science 2023-03-27 Antonina K. Begicheva , Irina A. Lomazova , Roman A. Nesterov

Differential Performance Debugging with Discriminant Regression Trees

Differential performance debugging is a technique to find performance problems. It applies in situations where the performance of a program is (unexpectedly) different for different classes of inputs. The task is to explain the differences…

Artificial Intelligence · Computer Science 2017-11-29 Saeid Tizpaz-Niari , Pavol Cerny , Bor-Yuh Evan Chang , Ashutosh Trivedi

How Can Subgroup Discovery Help AIOps?

The genuine supervision of modern IT systems brings new challenges as it requires higher standards of scalability, reliability and efficiency when analysing and monitoring big data streams. Rule-based inference engines are a key component…

Software Engineering · Computer Science 2021-09-13 Youcef Remil

A Rough Sets Partitioning Model for Mining Sequential Patterns with Time Constraint

Now a days, data mining and knowledge discovery methods are applied to a variety of enterprise and engineering disciplines to uncover interesting patterns from databases. The study of Sequential patterns is an important data mining problem…

Databases · Computer Science 2009-06-24 Jigyasa Bisaria , Namita Shrivastava , K. R. Pardasani

Data Origin Inference in Machine Learning

It is a growing direction to utilize unintended memorization in ML models to benefit real-world applications, with recent efforts like user auditing, dataset ownership inference and forgotten data measurement. Standing on the point of ML…

Machine Learning · Computer Science 2023-01-31 Mingxue Xu , Xiang-Yang Li

Interactive Semi-automated Specification Mining for Debugging: An Experience Report

Context: Specification mining techniques are typically used to extract the specification of a software in the absence of (up-to-date) specification documents. This is useful for program comprehension, testing, and anomaly detection.…

Software Engineering · Computer Science 2019-05-09 Mohammad Jafar Mashhadi , Taha R. Siddiqui , Hadi Hemmati , Howard Loewen

Event Detection in Time Series: Universal Deep Learning Approach

Event detection in time series is a challenging task due to the prevalence of imbalanced datasets, rare events, and time interval-defined events. Traditional supervised deep learning methods primarily employ binary classification, where…

Machine Learning · Statistics 2024-09-16 Menouar Azib , Benjamin Renard , Philippe Garnier , Vincent Génot , Nicolas André