Related papers: A Tutorial on Statistically Sound Pattern Discover…
Discovering statistically significant patterns from databases is an important challenging problem. The main obstacle of this problem is in the difficulty of taking into account the selection bias, i.e., the bias arising from the fact that…
Discriminative pattern mining is an essential task of data mining. This task aims to discover patterns which occur more frequently in a class than other classes in a class-labeled dataset. This type of patterns is valuable in various…
Data mining is the task of discovering interesting patterns from large amounts of data. There are many data mining tasks, such as classification, clustering, association rule mining, and sequential pattern mining. Sequential pattern mining…
Statistical dependence between hypotheses poses a significant challenge to the stability of large scale multiple hypotheses testing. Ignoring it often results in an unacceptably large spread in the false positive proportion even though the…
Pattern discovery algorithms in the music domain aim to find meaningful components in musical compositions. Over the years, although many algorithms have been developed for pattern discovery in music data, it remains a challenging task. To…
The features of a logically sound approach to a theory of statistical reasoning are discussed. A particular approach that satisfies these criteria is reviewed. This is seen to involve selection of a model, model checking, elicitation of a…
Statistically significant patterns mining (SSPM) is an essential and challenging data mining task in the field of knowledge discovery in databases (KDD), in which each pattern is evaluated via a hypothesis test. Our study aims to introduce…
Conditional-independence-based discovery uses statistical tests to identify a graphical model that represents the independence structure of variables in a dataset. These tests, however, can be unreliable, and algorithms are sensitive to…
The use of patterns in predictive models is a topic that has received a lot of attention in recent years. Pattern mining can help to obtain models for structured domains, such as graphs and sequences, and has been proposed as a means to…
Physical theories that depend on many parameters or are tested against data from many different experiments pose unique challenges to statistical inference. Many models in particle physics, astrophysics and cosmology fall into one or both…
This paper formalizes a latent variable inference problem we call {\em supervised pattern discovery}, the goal of which is to find sets of observations that belong to a single ``pattern.'' We discuss two versions of the problem and prove…
Statistical analysis is an important tool to distinguish systematic from chance findings. Current statistical analyses rely on distributional assumptions reflecting the structure of some underlying model, which if not met lead to problems…
We report our ongoing work about a new deep architecture working in tandem with a statistical test procedure for jointly training texts and their label descriptions for multi-label and multi-class classification tasks. A statistical…
In this paper, we focus on the problem of statistical dependence estimation using characteristic functions. We propose a statistical dependence measure, based on the maximum-norm of the difference between joint and product-marginal…
We describe a statistical hypothesis test for the presence of a signal. The test allows the researcher to fix the signal location and/or width a priori, or perform a search to find the signal region that maximizes the signal. The background…
The robust detection of statistical dependencies between the components of a complex system is a key step in gaining a network-based understanding of the system. Because of their simplicity and low computation cost, pairwise statistics are…
Consider an experiment involving a potentially small number of subjects. Some random variables are observed on each subject: a high-dimensional one called the "observed" random variable, and a one-dimensional one called the "outcome" random…
As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In…
User privacy can be compromised by matching user data traces to records of their previous behavior. The matching of the statistical characteristics of traces to prior user behavior has been widely studied. However, an adversary can also…
Statistical modeling plays a fundamental role in understanding the underlying mechanism of massive data (statistical inference) and predicting the future (statistical prediction). Although all models are wrong, researchers try their best to…