Related papers: Hypothesis testing for general network models
It has become an increasingly common practice for scientists in modern science and engineering to collect samples of multiple network data in which a network serves as a basic data object. The increasing prevalence of multiple network data…
In this paper, we propose a new spectral-based approach to hypothesis testing for populations of networks. The primary goal is to develop a test to determine whether two given samples of networks come from the same random model or…
Community detection in multi-layer networks is a fundamental task in complex network analysis across various areas like social, biological, and computer sciences. However, most existing algorithms assume that the number of communities is…
Random geometric graphs are widely used in modeling geometry and dependence structure in networks. In a random geometric graph, nodes are independently generated from some probability distribution $F$ over a metric space, and edges link…
We consider a two-sample hypothesis testing problem, where the distributions are defined on the space of undirected graphs, and one has access to only one observation from each model. A motivating example for this problem is comparing the…
Researchers theorize that many real-world networks exhibit community structure where within-community edges are more likely than between-community edges. While numerous methods exist to cluster nodes into different communities, less work…
How can researchers test for heterogeneity in the local structure of a network? In this paper, we present a framework that utilizes random sampling to give subgraphs which are then used in a goodness of fit test to test for heterogeneity.…
Comparing two population means of network data is of paramount importance in a wide range of scientific applications. Many existing network inference solutions focus on global testing of entire networks, without comparing individual network…
Methods of performing anomaly detection on high-dimensional data sets are needed, since algorithms which are trained on data are only expected to perform well on data that is similar to the training data. There are theoretical results on…
The stochastic block model is a popular tool for studying community structures in network data. We develop a goodness-of-fit test for the stochastic block model. The test statistic is based on the largest singular value of a residual matrix…
Networks describe the, often complex, relationships between individual actors. In this work, we address the question of how to determine whether a parametric model, such as a stochastic block model or latent space model, fits a dataset well…
Community detection is a fundamental problem in complex network data analysis. Though many methods have been proposed, most existing methods require the number of communities to be the known parameter, which is not in practice. In this…
Graph (or network) is a mathematical structure that has been widely used to model relational data. As real-world systems get more complex, multilayer (or multiple) networks are employed to represent diverse patterns of relationships among…
Networks are a useful representation for data on connections between units of interests, but the observed connections are often noisy and/or include missing values. One common approach to network analysis is to treat the network as a…
The stochastic block model (SBM) has been widely used to analyze network data. Various goodness-of-fit tests have been proposed to assess the adequacy of model structures. To the best of our knowledge, however, none of the existing…
Network (graph) data analysis is a popular research topic in statistics and machine learning. In application, one is frequently confronted with graph two-sample hypothesis testing where the goal is to test the difference between two graph…
Given two networks of differing sizes, it is of interest to test whether the two networks belong to the same distribution. We formalize the notion of "equality of distribution" under the framework of the generalized random dot product…
This paper studies the problem of distributed classification with a network of heterogeneous agents. The agents seek to jointly identify the underlying target class that best describes a sequence of observations. The problem is first…
Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue. However, whether these more…
Statistical modeling plays a fundamental role in understanding the underlying mechanism of massive data (statistical inference) and predicting the future (statistical prediction). Although all models are wrong, researchers try their best to…