数据分析、统计与概率
This paper proposes a new method for converting a time-series into a weighted graph (complex network), which builds on the electrostatic conceptualization originating from physics. The proposed method conceptualizes a time-series as a…
Gaussian process tomography (GPT) is a method used for obtaining real-time tomographic reconstructions of the plasma emissivity profile in a tokamak, given some model for the underlying physical processes involved. GPT can also be used,…
We propose methods to reconstruct particle distributions with and without considering initial volume fluctuations. This approach enables us to correct for detector efficiencies and initial volume fluctuations simultaneously. Our study…
This study presents the approach to analyzing the evolution of an arbitrary complex system whose behavior is characterized by a set of different time-dependent factors. The key requirement for these factors is only that they must contain an…
Analysis of x-ray absorption spectroscopy (XAS) data often involves the removal of artifacts or glitches from the acquired signal, a process commonly known as deglitching. Glitches result either from specific orientations of monochromator…
The frequentist definition of sensitivity of a search for new phenomena proposed in arXiv:0308063 has been utilized in a number of published experimental searches. In most cases, the simple approximate formula for the common problem of…
HEP event selection is traditionally considered a binary classification problem, involving the dichotomous categories of signal and background. In distribution fits for particle masses or couplings, however, signal events are not all…
We propose a new method to find modes based on active information. We develop an algorithm that, when applied to the whole space, will say whether there are any modes present \textit{and} where they are; this algorithm will reduce the…
Stochastic nonlinear dynamical systems can undergo rapid transitions relative to the change in their forcing, for example due to the occurrence of multiple equilibrium solutions for a specific interval of parameters. In this paper, we…
In the era of the Big Data revolution, methods for the automatic discovery of regularities in large datasets are becoming essential tools in applied sciences. This article presents an open software package, named MODULO (MODal mULtiscale…
Transfer learning refers to the use of knowledge gained while solving a machine learning task and applying it to the solution of a closely related problem. Such an approach has enabled scientific breakthroughs in computer vision and natural…
We propose a curvature-based approach for choosing good values for the time-delay parameter $\tau$ in delay reconstructions. The idea is based on the effects of the delay on the geometry of the reconstructions. If the delay is chosen too…
It is well accepted that, at the global scale, the Gutenberg-Richter (GR) law describing the distribution of earthquake magnitude or seismic moment has to be modified at the tail to properly account for the most extreme events. It is…
We describe a method to obtain point and dispersion estimates for the energies of jets arising from b quarks produced in proton-proton collisions at an energy of $\sqrt{s} =$ 13 TeV at the CERN LHC. The algorithm is trained on a large…
This paper is a first draft of the introduction to the special issue on volunteered geographic information published in Computers, Environment and Urban Systems (2015, 53, 1-122). In this short paper, I put georeferenced big data…
Serial electron diffraction (SerialED) is an emerging technique, which applies the snapshot data-collection mode of serial X-ray crystallography to three-dimensional electron diffraction (3D ED), forgoing the conventional rotation method.…
We study the accuracy and precision for estimating the fraction of observed levels $\varphi$ in quantum chaotic spectra through long-range correlations. We focus on the main statistics where theoretical formulas for the fraction of missing…
The Feynman-alpha method is a neutron noise technique that is used to estimate the prompt neutron period of fissile assemblies. The method and quantity are of widespread interest including in applications such as nuclear criticality safety,…
Chaos is ubiquitous in physical systems. The associated sensitivity to initial conditions is a significant obstacle in forecasting the weather and other geophysical fluid flows. Data assimilation is the process whereby the uncertainty in…
The Maximum Entropy Method (MEM) is a popular data analysis technique based on Bayesian inference, which has found various applications in the research literature. While the MEM itself is well-grounded in statistics, I argue that its…