Related papers: Fibonacci Binning
The histogram is an analysis tool in widespread use within many sciences, with high energy physics as a prime example. However, there exists an inherent bias in the choice of binning for the histogram, with different choices potentially…
Interaction nets are a graphical model of computation, which has been used to define efficient evaluators for functional calculi, and specifically lambda calculi with patterns. However, the flat structure of interaction nets forces pattern…
A collection of articles on the statistical modelling and inference of social networks is analysed in a network fashion. The references of these articles are used to construct a citation network data set, which is almost a directed acyclic…
Many predictions are probabilistic in nature; for example, a prediction could be for precipitation tomorrow, but with only a 30 percent chance. Given both the predictions and the actual outcomes, "reliability diagrams" (also known as…
Just as semantic hashing can accelerate information retrieval, binary valued embeddings can significantly reduce latency in the retrieval of graphical data. We introduce a simple but effective model for learning such binary vectors for…
Random graphs are more and more used for modeling real world networks such as evolutionary networks of proteins. For this purpose we look at two different models and analyze how properties like connectedness and degree distributions are…
This work lists and describes the main recent strategies for building fixed-length, dense and distributed representations for words, based on the distributional hypothesis. These representations are now commonly called word embeddings and,…
Recent successes in word embedding and document embedding have motivated researchers to explore similar representations for networks and to use such representations for tasks such as edge prediction, node label prediction, and community…
When reading peer-reviewed scientific literature describing any analysis of empirical data, it is natural and correct to proceed with the underlying assumption that experiments have made good faith efforts to ensure that their analyses…
The aim of this note is to survey the factorizations of the Fibonacci infinite word that make use of the Fibonacci words and other related words, and to show that all these factorizations can be easily derived in sequence starting from…
The degree distributions of complex networks are usually considered to be power law. However, it is not the case for a large number of them. We thus propose a new model able to build random growing networks with (almost) any wanted degree…
The beta distribution is a basic distribution serving several purposes. It is used to model data, and also, as a more flexible version of the uniform distribution, it serves as a prior distribution for a binomial probability. The bivariate…
In this paper we study some properties of Fibonacci-sum set-graphs. The aforesaid graphs are an extension of the notion of Fibonacci-sum graphs to the notion of set-graphs. The colouring of Fibonacci-sum graphs is also discussed. A number…
In many applications, the curvature of the space supporting the data makes the statistical modelling challenging. In this paper we discuss the construction and use of probability distributions wrapped around manifolds using exponential…
Histograms are convenient non-parametric density estimators, which continue to be used ubiquitously. Summary quantities estimated from histogram-based probability density models depend on the choice of the number of bins. We introduce a…
Binscatter is a popular method for visualizing bivariate relationships and conducting informal specification testing. We study the properties of this method formally and develop enhanced visualization and econometric binscatter tools. These…
We study the convergence of distributions on finite paths of weighted digraphs, namely the family of Boltzmann distributions and the sequence of uniform distributions. Targeting applications to the convergence of distributions on paths, we…
A new distribution is introduced, which we call the twin-t distribution. This distribution is heavy-tailed like the t distribution, but closer to normality in the central part of the curve. Its properties are described, e.g. the pdf, the…
An important task of community discovery in networks is assessing significance of the results and robust ranking of the generated candidate groups. Often in practice, numerous candidate communities are discovered, and focusing the analyst's…
We suggest partial logarithmic binning as the method of choice for uncovering the nature of many distributions encountered in information science (IS). Logarithmic binning retrieves information and trends "not visible" in noisy power-law…