数据分析、统计与概率
Modular structure is ubiquitous in real-world complex networks, and its detection is important because it gives insights in the structure-functionality Modular structure is ubiquitous in real-world complex networks, and its detection is…
The paper comments on properties of the so-called "Unified approach to the construction of classical confidence intervals", in which confidence intervals are computed in a Neyman construction using the likelihood ratio as ordering quantity.…
Recent $daily$ data of the Southern Oscillation Index have been analyzed. The power spectrum indicates major intrinsic geophysical short periods. We find interesting ``high frequency'' oscillations at 24, 27, 37, 76, 100 and 365 days. In…
A common situation in experimental physics is to have a signal which can not be separated from a non-interfering background through the use of any cut. In this paper, we describe a procedure for determining, on an event-by-event basis, a…
The theory of probability, based on very general rules referred to as the Cox-Polya-Jaynes Desiderata, can be used both as a theory of random mass phenomena and as a quantitative theory of plausible inference about the parameters of…
This paper aims to measure the efficiency of urban street networks (a kind of complex networks) from the perspective of the multidimensional chain of connectivity (or flow). More specifically, we define two quantities: flow dimension and…
The discovery of community structure is a common challenge in the analysis of network data. Many methods have been proposed for finding community structure, but few have been proposed for determining whether the structure found is…
We study the crowding of near-extreme events in the time gaps between successive finishers in major international marathons. Naively, one might expect these gaps to become progressively larger for better-placing finishers. While such an…
We study a family of opinion formation models in one dimension where the propensity for a voter to align with its local environment depends non-linearly on the fraction of disagreeing neighbors. Depending on this non-linearity in the voting…
In this article we discuss six degrees of separation, which has been proposed by Milgram, from a theoretical point of view. Simply if one has $k$ friends, the number $N$ of indirect friends goes up to $\sim k^d$ in $d$ degrees of…
We study the scaling behavior of the fluctuations, as extracted through wavelet coefficients based on discrete wavelets. The analysis is carried out on a variety of physical data sets, as well as Gaussian white noise and binomial…
Although the inference of global community structure in networks has recently become a topic of great interest in the physics community, all such algorithms require that the graph be completely known. Here, we define both a measure of local…
Data series generated by complex systems exhibit fluctuations on many time scales and/or broad distributions of the values. In both equilibrium and non-equilibrium situations, the natural fluctuations are often found to follow a scaling…
The sample median is often used in statistical analyses of physical or astronomical data wherein a central value must be found from samples polluted by elements which do not belong to the population of interest or when the underlying…
Several approaches to testing the hypothesis that two histograms are drawn from the same distribution are investigated. We note that single-sample continuous distribution tests may be adapted to this two-sample grouped data situation. The…
Evolutionary Computation is a branch of computer science with which, traditionally, High Energy Physics has fewer connections. Its methods were investigated in this field, mainly for data analysis tasks. These methods and studies are,…
This paper discusses some problems possibly arising when approximating via Monte-Carlo simulations the distributions of goodness-of-fit test statistics based on the empirical distribution function. We argue that failing to re-estimate…
We consider the problem of finding communities or modules in directed networks. The most common approach to this problem in the previous literature has been simply to ignore edge direction and apply methods developed for community discovery…
This lecture will introduce the Support Vector algorithms for classification and regression. They are an application of the so called kernel trick, which allows the extension of a certain class of linear algorithms to the non linear case.…
This document introduces basics in data preparation, feature selection and learning basics for high energy physics tasks. The emphasis is on feature selection by principal component analysis, information gain and significance measures for…