Related papers: sPlot: A Quick Introduction
The paper advocates the use of a statistical tool dedicated to the exploration of data samples populated by several sources of events. This new technique, called sPlot, is able to unfold the contributions of the different sources to the…
This paper presents a statistical method to subtract background in maximum likelihood fit, without relying on any separate sideband or simulation for background modeling. The method, called sFit, is an extension to the sPlot technique…
A new approach of obtaining stratified random samples from statistically dependent random variables is described. The proposed method can be used to obtain samples from the input space of a computer forward model in estimating expectations…
The Statistical Toolkit is an open source system specialized in the statistical comparison of distributions. It addresses requirements common to different experimental domains, such as simulation validation (e.g. comparison of experimental…
In this article we propose an optimal method referred to as SPlit for splitting a dataset into training and testing sets. SPlit is based on the method of Support Points (SP), which was initially developed for finding the optimal…
Non-probability sampling, for example in the form of online panels, has become a fast and cheap method to collect data. While reliable inference tools are available for classical probability samples, non-probability samples can yield…
Sampling distribution, a foundational concept in statistics, is difficult to understand, since we usually have only one realization of the estimator of interest. In this work, we present an innovative method for helping university students…
Data analysis in high energy physics has to deal with data samples produced from different sources. One of the most widely used ways to unfold their contributions is the sPlot technique. It uses the results of a maximum likelihood fit to…
Scatterplots are a common tool for exploring multidimensional datasets, especially in the form of scatterplot matrices (SPLOMs). However, scatterplots suffer from overplotting when categorical variables are mapped to one or two axes, or the…
In this paper, the author presents a new tool, called The Convergence Plane, that allows to study the real dynamics of iterative methods whose iterations depends on one parameter in an easy and compact way. This tool can be used, inter…
Statistical matching aims to integrate two statistical sources. These sources can be two samples or a sample and the entire population. If two samples have been selected from the same population and information has been collected on…
Nonprobability (convenience) samples are increasingly sought to reduce the estimation variance for one or more population variables of interest that are estimated using a randomized survey (reference) sample by increasing the effective…
We present a technique for constructing suitable posterior probability distributions in situations for which the sampling distribution of the data is not known. This is very useful for modern scientific data analysis in the era of "big…
Project portfolio management is an essential process for organizations aiming to optimize the value of their R&D investments. In this article, we introduce a new tool designed to support the prioritization of projects within project…
Conventional statistics begins with a model, and assigns a likelihood of obtaining any particular set of data. The opposite approach, beginning with the data and assigning a likelihood to any particular model, is explored here for the case…
The utilization of statistical methods an their applications within the new field of study known as Topological Data Analysis has has tremendous potential for broadening our exploration and understanding of complex, high-dimensional data…
Data visualization is essential for developing an understanding of a complex system. The power grid is one of the most complex systems in the world and effective power grid research visualization software must 1) be easy to use, 2) support…
Sampling is often a necessary evil to reduce the processing and storage costs of distributed tracing. In this work, we describe a scalable and adaptive sampling approach that can preserve events of interest better than the widely used…
Many real-world applications of flow-based generative models desire a diverse set of samples that cover multiple modes of the target distribution. However, the predominant approach for obtaining diverse sets is not sample-efficient, as it…
The use of machine learning approaches continues to have many benefits in experimental nuclear and particle physics. One common issue is generating training data which is sufficiently realistic to give reliable results. Here we advocate…