Related papers: A Sampling-based Framework for Hypothesis Testing …
Motivated by gene set enrichment analysis, we investigate the problem of combined hypothesis testing on a graph. We introduce a general framework to effectively use the structural information of the underlying graph when testing…
Hypothesis testing for graphs has been an important tool in applied research fields for more than two decades, and still remains a challenging problem as one often needs to draw inference from few replicates of large graphs. Recent studies…
Topological data analysis involves the statistical characterization of the shape of data. Persistent homology is a primary tool of topological data analysis, which can be used to analyze topological features and perform statistical…
The graph based approach to multiple testing is an intuitive method that enables a study team to represent clearly, through a directed graph, its priorities for hierarchical testing of multiple hypotheses, and for propagating the available…
Estimating characteristics of large graphs via sampling is a vital part of the study of complex networks. Current sampling methods such as (independent) random vertex and random walks are useful but have drawbacks. Random vertex sampling…
We consider the multiple hypothesis testing (MHT) problem over the joint domain formed by a graph and a measure space. On each sample point of this joint domain, we assign a hypothesis test and a corresponding $p$-value. The goal is to make…
Sampling is a standard approach in big-graph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population.…
Graph sampling is a technique to pick a subset of vertices and/ or edges from original graph. It has a wide spectrum of applications, e.g. survey hidden population in sociology [54], visualize social graph [29], scale down Internet AS graph…
Network datasets appear across a wide range of scientific fields, including biology, physics, and the social sciences. To enable data-driven discoveries from these networks, statistical inference techniques like estimation and hypothesis…
In clinical trials, hypotheses are frequently organized into hierarchically ordered families, requiring specialized testing strategies that account for these structured relationships. Existing gatekeeping methods-including serial, parallel,…
Graph-based tests are a class of non-parametric two-sample tests useful for analyzing high-dimensional data. The test statistics are constructed from similarity graphs (such as K-minimum spanning tree), and consequently, their performance…
High-dimensional feature selection is a central problem in a variety of application domains such as machine learning, image analysis, and genomics. In this paper, we propose graph-based tests as a useful basis for feature selection. We…
We develop a new sampling method to estimate eigenvector centrality on incomplete networks. Our goal is to estimate this global centrality measure having at disposal a limited amount of data. This is the case in many real-world scenarios…
As large graph datasets become increasingly common across many fields, sampling is often needed to reduce the graphs into manageable sizes. This procedure raises critical questions about representativeness as no sample can capture the…
A two-sample hypothesis test is a statistical procedure used to determine whether the distributions generating two samples are identical. We consider the two-sample testing problem in a new scenario where the sample measurements (or sample…
Sparse exchangeable graphs on $\mathbb{R}_+$, and the associated graphex framework for sparse graphs, generalize exchangeable graphs on $\mathbb{N}$, and the associated graphon framework for dense graphs. We develop the graphex framework as…
Two-sample tests utilizing a similarity graph on observations are useful for high-dimensional and non-Euclidean data due to their flexibility and good performance under a wide range of alternatives. Existing works mainly focused on sparse…
Rejecting the null hypothesis in two-sample testing is a fundamental tool for scientific discovery. Yet, aside from concluding that two samples do not come from the same probability distribution, it is often of interest to characterize how…
High dimensional hypothesis test deals with models in which the number of parameters is significantly larger than the sample size. Existing literature develops a variety of individual tests. Some of them are sensitive to the dense and small…
Graphlets are induced subgraph patterns and have been frequently applied to characterize the local topology structures of graphs across various domains, e.g., online social networks (OSNs) and biological networks. Discovering and computing…