数据分析、统计与概率
When the network is reconstructed, two types of errors can occur: false positive and false negative errors about the presence or absence of links. In this paper, the influence of these two errors on the vertex degree distribution is…
Analyzing data from paleoclimate archives such as tree rings or lake sediments offers the opportunity of inferring information on past climate variability. Often, such data sets are univariate and a proper reconstruction of the system's…
The Parzen window density is a well-known technique, associating Gaussian kernels with data points. It is a very useful tool in data exploration, with particular importance for clustering schemes and image analysis. This method is presented…
A principled approach to characterize the hidden structure of networks is to formulate generative models, and then infer their parameters from data. When the desired structure is composed of modules or "communities", a suitable choice for…
Motivated by the presence of deep connections among dynamical equations, experimental data, physical systems, and statistical modeling, we report on a series of findings uncovered by the Authors and collaborators during the last decade…
The Sobol' indices are a recognized tool in global sensitivity analysis. When the uncertain variables in a model are statistically independent, the Sobol' indices may be easily interpreted and utilized. However, their interpretation and…
A basic systems question concerns the concept of closure, meaning autonomomy (closed) in the sense of describing the (sub)system as fully consistent within itself. Alternatively, the system may be nonautonomous (open) meaning it receives…
AMORPH utilizes a new Bayesian statistical approach to interpreting X-ray diffraction results of samples with both crystalline and amorphous components. AMORPH fits X-ray diffraction patterns with a mixture of narrow and wide components,…
We study the popular centrality measure known as effective conductance or in some circles as information centrality. This is an important notion of centrality for undirected networks, with many applications, e.g., for random walks,…
We introduce a nonparametric approach for estimating drift and diffusion functions in systems of stochastic differential equations from observations of the state vector. Gaussian processes are used as flexible models for these functions and…
We describe a method to construct directed networks from multivariate time series which has several advantages over the widely accepted methods. This method is based on an information theoretic reduction of linear (auto-regressive) models.…
We compare and contrast the statistical physics and quantum physics inspired approaches for unsupervised generative modeling of classical data. The two approaches represent probabilities of observed data using energy-based models and…
Correcting measured detector-level distributions to particle-level is essential to make data usable outside the experimental collaborations. The term unfolding is used to describe this procedure. A new method of unfolding data using a…
Reliable data quality monitoring is a key asset in delivering collision data suitable for physics analysis in any modern large-scale High Energy Physics experiment. This paper focuses on the use of artificial neural networks for supervised…
In high-energy physics, with the search for ever smaller signals in ever larger data sets, it has become essential to extract a maximum of the available information from the data. Multivariate classification methods based on machine…
The Fourier phase information play a key role for the quantified description of nonlinear data. We present a novel tool for time series analysis that identifies nonlinearities by sensitively detecting correlations among the Fourier phases.…
In our previous paper (I) we derived information geometric objects from the two parameter generalized entropy of Hanel and Thurner (2011), using the c,d parameters as labels of the corresponding manifolds. Here we follow a completely…
Big data has become a critically enabling component of emerging mathematical methods aimed at the automated discovery of dynamical systems, where first principles modeling may be intractable. However, in many engineering systems, abrupt…
The space-time representation of high-dimensional dynamical systems that have a well defined characteristic time scale has proven to be very useful to deepen the understanding of such systems and to uncover hidden features in their output…
Non-symmetric rectangular correlation matrices occur in many problems in economics. We test the method of extracting statistically meaningful correlations between input and output variables of large dimensionality and build a toy model for…