Related papers: On central tendency and dispersion measures for in…
Symbolic Data Analysis works with variables for which each unit or class of units takes a finite set of values/categories, an interval or a distribution (an histogram, for instance). When to each observation corresponds an empirical…
Exploiting the geometric nature of statistical divergences, we devise a way to define associated induced uncertainty measures for discrete and finite probability distributions. We also report new uncertainty measures and discuss their…
In data mining, it is usually to describe a set of individuals using some summaries (means, standard deviations, histograms, confidence intervals) that generalize individual descriptions into a typology description. In this case, data can…
The paper presents a construction of a quantitative measure of variability for parameter estimates in the data fitting problem under interval uncertainty. It shows the degree of variability and ambiguity of the estimate, and the need for…
The study of associations and their causal explanations is a central research activity whose methodology varies tremendously across fields. Even within specialized subfields, comparisons across textbooks and journals reveals that the basics…
Confidence intervals are a popular way to visualize and analyze data distributions. Unlike p-values, they can convey information both about statistical significance as well as effect size. However, very little work exists on applying…
We analyze different data of the variation of the fine structure constant obtained with different methods to check their consistency.We test consistency using the modified Student test and confidence intervals. We split the data sets in…
We prove a central limit theorem for a sequence of random variables whose means are ambiguous and vary in an unstructured way. Their joint distribution is described by a set of measures. The limit is (not the normal distribution and is)…
This article provides an overview on the statistical modeling of complex data as increasingly encountered in modern data analysis. It is argued that such data can often be described as elements of a metric space that satisfies certain…
We study the probabilistic behavior of persistence-based statistics and propose a novel nonparametric framework for detecting structural changes in high-dimensional random point clouds. We establish moment bounds and tightness results for…
The coefficient of variation (CV) is commonly used to measure relative dispersion. However, since it is based on the sample mean and standard deviation, outliers can adversely affect the CV. Additionally, for skewed distributions the mean…
Measurement system analysis aims to quantify the variability in data attributable to the measurement system and evaluate its contribution to overall data variability. This paper conducts a rigorous theoretical investigation of the…
This paper generalizes the traditional statistical concept of prediction intervals for arbitrary probability density functions in high-dimensional feature spaces by introducing significance level distributions, which provides…
Measures of uncertainty and divergence are introduced for interval-valued probability distributions and are shown to have desirable mathematical properties. A maximum uncertainty inference procedure for marginal interval distributions is…
Three aspects of time series are uncertainty (dispersion at a given time scale), scaling (time-scale dependence), and intermittency (inclination to change dynamics). Simple measures of dispersion are the mean absolute deviation and the…
In this paper we first provide a method to compute confidence intervals for the center of a piecewise normal distribution given a sample from this distribution, under certain assumptions. We then extend this method to an asymptotic setting,…
Researchers have developed ways to generalize the mean and variance to situations in which a data metric is available. We apply the tools developed in Pennec (2006) to categorical data, and show the generality of this approach by…
Introductory statistical inference texts and courses treat the point estimation, hypothesis testing, and interval estimation problems separately, with primary emphasis on large-sample approximations. Here I present an alternative approach…
This paper gives a review of concentration inequalities which are widely employed in non-asymptotical analyses of mathematical statistics in a wide range of settings, from distribution-free to distribution-dependent, from sub-Gaussian to…
Interval analysis, when applied to the so called problem of experimental data fitting, appears to be still in its infancy. Sometimes, partly because of the unrivaled reliability of interval methods, we do not obtain any results at all.…