Related papers: Significant Subgraph Mining with Multiple Testing …

Searching for significant patterns in stratified data

Significant pattern mining, the problem of finding itemsets that are significantly enriched in one class of objects, is statistically challenging, as the large space of candidate patterns leads to an enormous multiple testing problem.…

Machine Learning · Statistics 2015-08-25 Felipe Llinares-Lopez , Laetitia Papaxanthos , Dean Bodenham , Karsten Borgwardt

An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets

As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In…

Databases · Computer Science 2010-02-08 Adam Kirsch , Michael Mitzenmacher , Andrea Pietracaprina , Geppino Pucci , Eli Upfal , Fabio Vandin

Multi-Objective Statistical Model Checking using Lightweight Strategy Sampling (extended version)

Statistical model checking delivers quantitative verification results with statistical guarantees by applying Monte Carlo simulation to formal models. It scales to model sizes and model types that are out of reach for exhaustive, analytical…

Logic in Computer Science · Computer Science 2025-11-18 Pedro R. D'Argenio , Arnd Hartmanns , Patrick Wienhöft , Mark van Wijk

Statistically Significant Pattern Mining with Ordinal Utility

Statistically significant patterns mining (SSPM) is an essential and challenging data mining task in the field of knowledge discovery in databases (KDD), in which each pattern is evaluated via a hypothesis test. Our study aims to introduce…

Methodology · Statistics 2020-08-26 Thien Q. Tran , Kazuto Fukuchi , Youhei Akimoto , Jun Sakuma

Graph Pruning for Enumeration of Minimal Unsatisfiable Subsets

Finding Minimal Unsatisfiable Subsets (MUSes) of binary constraints is a common problem in infeasibility analysis of over-constrained systems. However, because of the exponential search space of the problem, enumerating MUSes is extremely…

Artificial Intelligence · Computer Science 2024-02-27 Panagiotis Lymperopoulos , Liping Liu

Contrast Subgraph Mining from Coherent Cores

Graph pattern mining methods can extract informative and useful patterns from large-scale graphs and capture underlying principles through the overwhelmed information. Contrast analysis serves as a keystone in various fields and has…

Social and Information Networks · Computer Science 2018-02-20 Jingbo Shang , Xiyao Shi , Meng Jiang , Liyuan Liu , Timothy Hanratty , Jiawei Han

Itemsets for Real-valued Datasets

Pattern mining is one of the most well-studied subfields in exploratory data analysis. While there is a significant amount of literature on how to discover and rank itemsets efficiently from binary data, there is surprisingly little…

Data Structures and Algorithms · Computer Science 2019-02-05 Nikolaj Tatti

Efficient Multiple Testing Adjustment for Hierarchical Inference

Hierarchical inference in (generalized) regression problems is powerful for finding significant groups or even single covariates, especially in high-dimensional settings where identifiability of the entire regression parameter vector may be…

Methodology · Statistics 2021-10-22 Claude Renaux , Peter Bühlmann

Bagging multiple comparisons from microarray data

The problem of large-scale simultaneous hypothesis testing is re-visited. Bagging and subagging procedures are put forth with the purpose of improving the discovery power of the tests. The procedures are implemented in both simulated and…

Methodology · Statistics 2007-05-23 Dimitris N. Politis

Incremental Mining of Frequent Serial Episodes Considering Multiple Occurrences

The need to analyze information from streams arises in a variety of applications. One of its fundamental research directions is to mine sequential patterns over data streams. Current studies mine series of items based on the presence of the…

Databases · Computer Science 2022-04-12 Thomas Guyet , Wenbin Zhang , Albert Bifet

More Powerful Selective Kernel Tests for Feature Selection

Refining one's hypotheses in the light of data is a common scientific practice; however, the dependency on the data introduces selection bias and can lead to specious statistical analysis. An approach for addressing this is via conditioning…

Machine Learning · Computer Science 2020-03-03 Jen Ning Lim , Makoto Yamada , Wittawat Jitkrittum , Yoshikazu Terada , Shigeyuki Matsui , Hidetoshi Shimodaira

Predictive Subsampling for Scalable Inference in Networks

Network datasets appear across a wide range of scientific fields, including biology, physics, and the social sciences. To enable data-driven discoveries from these networks, statistical inference techniques like estimation and hypothesis…

Methodology · Statistics 2026-02-19 Arpan Kumar , Minh Tang , Srijan Sengupta

Analysis of data in the form of graphs

We discuss the problem of extending data mining approaches to cases in which data points arise in the form of individual graphs. Being able to find the intrinsic low-dimensionality in ensembles of graphs can be useful in a variety of…

Data Analysis, Statistics and Probability · Physics 2013-06-18 Karthikeyan Rajendran , Ioannis G. Kevrekidis

Data-adaptive statistics for multiple hypothesis testing in high-dimensional settings

Current statistical inference problems in areas like astronomy, genomics, and marketing routinely involve the simultaneous testing of thousands -- even millions -- of null hypotheses. For high-dimensional multivariate distributions, these…

Methodology · Statistics 2017-04-25 Weixin Cai , Nima S. Hejazi , Alan E. Hubbard

Mining Frequent Itemsets from Secondary Memory

Mining frequent itemsets is at the core of mining association rules, and is by now quite well understood algorithmically. However, most algorithms for mining frequent itemsets assume that the main memory is large enough for the data…

Databases · Computer Science 2016-08-16 Gösta Grahne , Jianfei Zhu

A Survey on Mining and Analysis of Uncertain Graphs

\emph{Uncertain Graph} (also known as \emph{Probabilistic Graph}) is a generic model to represent many real\mbox{-}world networks from social to biological. In recent times analysis and mining of uncertain graphs have drawn significant…

Databases · Computer Science 2021-06-16 Suman Banerjee

Beyond Bonferroni: Hierarchical Multiple Testing in Empirical Research

Empirical research in the social and medical sciences frequently involves testing multiple hypotheses simultaneously, increasing the risk of false positives due to chance. Classical multiple testing procedures, such as the Bonferroni…

Econometrics · Economics 2025-07-29 Sebastian Calonico , Sebastian Galiani

Data mining when each data point is a network

We discuss the problem of extending data mining approaches to cases in which data points arise in the form of individual graphs. Being able to find the intrinsic low-dimensionality in ensembles of graphs can be useful in a variety of…

Social and Information Networks · Computer Science 2016-12-12 Karthikeyan Rajendran , Assimakis A. Kattis , Alexander Holiday , Risi Kondor , Ioannis G. Kevrekidis

Mining Statistically Significant Attribute Associations in Attributed Graphs

Recently, graphs have been widely used to represent many different kinds of real world data or observations such as social networks, protein-protein networks, road networks, and so on. In many cases, each node in a graph is associated with…

Social and Information Networks · Computer Science 2016-09-28 Jihwan Lee , Keehwan Park , Sunil Prabhakar

Mining Weighted Sequential Patterns in Incremental Uncertain Databases

Due to the rapid development of science and technology, the importance of imprecise, noisy, and uncertain data is increasing at an exponential rate. Thus, mining patterns in uncertain databases have drawn the attention of researchers.…

Databases · Computer Science 2024-04-02 Kashob Kumar Roy , Md Hasibul Haque Moon , Md Mahmudur Rahman , Chowdhury Farhan Ahmed , Carson Kai-Sang Leung