数据分析、统计与概率
Applying any strategy requires some knowledge about the past state of the system. Unfortunately in the case of economy collecting information is a difficult, expensive and time consuming process. Therefore the information about the system…
A kernel based procedure for correcting experimental data for distortions due to the finite resolution and limited detector acceptance is presented. The unfolding problem is known to be an ill-posed problem that can not be solved without…
When reading peer-reviewed scientific literature describing any analysis of empirical data, it is natural and correct to proceed with the underlying assumption that experiments have made good faith efforts to ensure that their analyses…
The publication by Gagunashvili [arXiv:1011.0662] suffers from several caveats: i) The method is based upon the false assumption that the median of chi square distributed random variables is chi square distributed. ii) The information…
A nonlinear dynamics approach can be used in order to quantify complexity in written texts. As a first step, a one-dimensional system is examined : two written texts by one author (Lewis Carroll) are considered, together with one…
The Visual Physics Analysis (VISPA) project integrates different aspects of physics analyses into a graphical development environment. It addresses the typical development cycle of (re-)designing, executing and verifying an analysis. The…
The increasing interest in renewable energy, particularly in wind, has given rise to the necessity of accurate models for the generation of good synthetic wind speed data. Markov chains are often used with this purpose but better models are…
Using 55 years of daily average temperatures from a local weather station, I made a least-absolute-deviations (LAD) regression model that accounts for three effects: seasonal variations, the 11-year solar cycle, and a linear trend. The…
Given a set of several inputs into a system (e.g., independent variables characterizing stimuli) and a set of several stochastically non-independent outputs (e.g., random variables describing different aspects of responses), how can one…
The economy globalization measure problem is discussed. Four macroeconomic indices of twenty among the "richest" countries are examined. Four types of "distances" are calculated.Two types of networks are next constructed for each distance…
The Theil index is much used in economy and finance; it looks like the Shannon entropy, but pertains to event values rather than to their probabilities. Any time series can be remapped through the Theil index. Correlation coefficients can…
Starting from the idea of Tsallis on non-extensive statistical mechanics and the {\it q-entropy} notion, we recall the Theil index $Th$ and transform it into the $Th_q$ index. Both indices can be used to map onto themselves any time series…
The problem of assigning probabilities when little is known is analized in the case where the quanities of interest are physical observables, i.e. can be measured and their values expressed by numbers. It is pointed out that the assignment…
The measurement of the efficiency of an event selection is always an important part of the analysis of experimental data. The statistical techniques which are needed to determine the efficiency and its uncertainty are reviewed. Frequentist…
Traditionally, the Method of (Shannon-Kullback's) Relative Entropy Maximization (REM) is considered with linear moment constraints. In this work, the method is studied under frequency moment constraints which are non-linear in…
(Jaynes') Method of (Shannon-Kullback's) Relative Entropy Maximization (REM or MaxEnt) can be - at least in the discrete case - according to the Maximum Probability Theorem (MPT) viewed as an asymptotic instance of the Maximum Probability…
In reality, many real-world networks interact with and depend on other networks. We develop an analytical framework for studying interacting networks and present an exact percolation law for a network of $n$ interdependent networks (NON).…
The analysis of the modular structure of networks is a major challenge in complex networks theory. The validity of the modular structure obtained is essential to confront the problem of the topology-functionality relationship. Recently,…
Unsupervised clustering, also known as natural clustering, stands for the classification of data according to their similarities. Here we study this problem from the perspective of complex networks. Mapping the description of data…
We review possible measures of complexity which might in particular be applicable to situations where the complexity seems to arise spontaneously. We point out that not all of them correspond to the intuitive (or "naive") notion, and that…