数据分析、统计与概率
I give a simple analysis of the game that I previously published in Scientific American which shows the paradoxical behavior whereby two losing games randomly combine to form a winning game. The game, modeled on a random walk, requires only…
The efficacy of particle identification is compared using artificial neutral networks and boosted decision trees. The comparison is performed in the context of the MiniBooNE, an experiment at Fermilab searching for neutrino oscillations.…
We discuss some aspects of Astumian suggestions that combination of biased games (Parrondo's paradox) can explain performance of molecular motors. Unfortunately the model is flawed by explicit asymmetry overlooked by the author. In…
A Monte Carlo algorithm for calculating the single-particle spin-echo small-angle neutron scattering (SESANS) correlation function is presented. It is argued that the algorithm provides a general and efficient way of calculating SESANS data…
Many data-based statistical algorithms require that one find \textit{near or nearest neighbors} to a given vector among a set of points in that vector space, usually with Euclidean topology. The k-d data structure and search algorithms are…
Asymmetric statistical errors arise for experimental results obtained by Maximum Likelihood estimation, in cases where the number of results is finite and the log likelihood function is not a symmetric parabola. This note discusses how…
We present a methodology for detecting non-linearities in data sets based on the characterization of the structural features of the Fourier phase maps. A Fourier phase map is a 2D set of points $M= \{(\phi_{\vec{k}}, \phi_{\vec{k} +…
Stress time series from the PLC effect typically exhibit stick-slips of upload and download type. These data contain strong short-term correlations of a nonlinear type. We investigate whether there are also long term correlations, i.e. the…
We consider extension of Granger causality to nonlinear bivariate time series. In this frame, if the prediction error of the first time series is reduced by including measurements from the second time series, then the second time series is…
Main result of this paper is to derive the exact analytical expressions of information and covariance matrices for multivariate Burr III and logistic distributions. These distributions arise as tractable parametric models in price and…
A number of signal processing and statistical methods can be used in analyzing either pieces of text or DNA sequences. These techniques can be used in a number of ways, such as determining authorship of documents, finding genes in DNA, and…
A randomization test was developed to determine the statistical significance of QCD intermittency in single-event distributions. A total of 96 simulated intermittent distributions based on standard normal Gaussian distributions of size…
This paper shows in detail the application of a new stochastic approach for the characterization of surface height profiles, which is based on the theory of Markov processes. With this analysis we achieve a characterization of the scale…
In this work we propose a Bayesian framework for fully automated image fusion and their joint segmentation. More specifically, we consider the case where we have observed images of the same object through different image processes or…
In this work we propose a Bayesian framework for data fusion of multivariate signals which arises in imaging systems. More specifically, we consider the case where we have observed two images of the same object through two different imaging…
In this work we consider time series with a finite number of discrete point changes. We assume that the data in each segment follows a different probability density functions (pdf). We focus on the case where the data in all segments are…
The issue of asymmetric uncertainties resulting from fits, nonlinear propagation and systematic effects is reviewed. It is shown that, in all cases, whenever a published result is given with asymmetric uncertainties, the value of the…
The points at which the log likelihood falls by 1/2 from its maximum value are often used to give the `errors' on a result, i.e. the 68% central confidence interval. The validity of this is examined for two simple cases: a lifetime…
This article describes a robust algorithm to estimate a conditional probability density f(t|x) as a non-parametric smooth regression function. It is based on a neural network and the Bayesian interpretation of the network output as a…
Two-dimensional generalization of the original peak finding algorithm suggested earlier is given. The ideology of the algorithm emerged from the well known quantum mechanical tunneling property which enables small bodies to penetrate…