English
Related papers

Related papers: A Note on Automatic Data Transformation

200 papers

Many real data sets contain numerical features (variables) whose distribution is far from normal (gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them…

Machine Learning · Statistics 2024-07-08 Jakob Raymaekers , Peter J. Rousseeuw

We present a technique for constructing suitable posterior probability distributions in situations for which the sampling distribution of the data is not known. This is very useful for modern scientific data analysis in the era of "big…

Instrumentation and Methods for Astrophysics · Physics 2017-08-30 Steven Gratton

Few-shot image classification has recently witnessed the rise of representation learning being utilised for models to adapt to new classes using only a few training examples. Therefore, the properties of the representations, such as their…

Computer Vision and Pattern Recognition · Computer Science 2023-09-29 Vaibhav Ganatra

We introduce a novel approach based on stochastic optimization to find the optimal sampling distribution for the data-driven stability analysis of switched linear systems. Our goal is to address limitations of existing approaches, in…

Optimization and Control · Mathematics 2025-09-01 Alexis Vuille , Guillaume O. Berger , Raphaël M. Jungers

Recently, several studies proposed non-linear transformations, such as a logarithmic or Gaussianization transformation, as efficient tools to recapture information about the (Gaussian) initial conditions. During non-linear evolution, part…

Cosmology and Nongalactic Astrophysics · Physics 2015-06-16 Julien Carron , Istvan Szapudi

Anomaly detection is a field of intense research. Identifying low probability events in data/images is a challenging problem given the high-dimensionality of the data, especially when no (or little) information about the anomaly is…

Machine Learning · Computer Science 2022-04-13 José A. Padrón-Hidalgo , Valero Laparra , Gustau Camps-Valls

Many variables in the social, physical, and biosciences, including neuroscience, are non-normally distributed. To improve the statistical properties of such data, or to allow parametric testing, logarithmic or logit transformations are…

Methodology · Statistics 2018-01-08 Sacha Jennifer van Albada , Peter A. Robinson

We propose data thinning, an approach for splitting an observation into two or more independent parts that sum to the original observation, and that follow the same distribution as the original observation, up to a (known) scaling of a…

Methodology · Statistics 2023-11-22 Anna Neufeld , Ameer Dharamshi , Lucy L. Gao , Daniela Witten

Several distributions and families of distributions are proposed to model skewed data, think, e.g., of skew-normal and related distributions. Lambert W random variables offer an alternative approach where, instead of constructing a new…

Methodology · Statistics 2023-10-17 Meelis Käärik , Anne Selart , Tuuli Puhkim , Liivika Tee

In 2023, the U.S. Food and Drug Administration issued guidance for adjustment of covariates in randomized clinical trials, emphasizing its role in enhancing precision and power through prognostic baseline variables. Despite its potential,…

Methodology · Statistics 2026-05-28 Kelly Van Lancker , Iván Díaz , Stijn Vansteelandt

Probabilistic programming is perfectly suited to reliable and transparent data science, as it allows the user to specify their models in a high-level language without worrying about the complexities of how to fit the models. Static analysis…

Artificial Intelligence · Computer Science 2020-08-31 Ryan Bernstein , Matthijs Vákár , Jeannette Wing

Logarithmic transformation of the data has been recommended by the literature in the case of highly skewed distributions such as those commonly found in information science. The purpose of the transformation is to make the data conform to…

Information Retrieval · Computer Science 2009-11-19 Loet Leydesdorff , Stephen Bensman

Symbolic data analysis (SDA) aggregates large individual-level datasets into a small number of distributional summaries, such as random rectangles or random histograms. The inference is carried out using these summaries in place of the…

Methodology · Statistics 2026-04-02 Yu Yang , Matias Quiroz , Boris Beranger , Robert Kohn , Scott A. Sisson

Finite mixture of Gaussian distributions provide a flexible semi-parametric methodology for density estimation when the variables under investigation have no boundaries. However, in practical applications variables may be partially bounded…

Methodology · Statistics 2019-12-30 Luca Scrucca

Data compression has become one of the cornerstones of modern astronomical data analysis, with the vast majority of analyses compressing large raw datasets down to a manageable number of informative summaries. In this paper we provide a…

Cosmology and Nongalactic Astrophysics · Physics 2018-04-04 Justin Alsing , Benjamin Wandelt

Randomized smoothing is a recent technique that achieves state-of-art performance in training certifiably robust deep neural networks. While the smoothing family of distributions is often connected to the choice of the norm used for…

Machine Learning · Computer Science 2022-07-06 Motasem Alfarra , Adel Bibi , Philip H. S. Torr , Bernard Ghanem

The superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed…

Data Analysis, Statistics and Probability · Physics 2022-06-01 Kyle R. Lennon , Gareth H. McKinley , James W. Swan

This article provides an original understanding of the behavior of a class of graph-oriented semi-supervised learning algorithms in the limit of large and numerous data. It is demonstrated that the intuition at the root of these methods…

Machine Learning · Computer Science 2017-11-10 Xiaoyi Mai , Romain Couillet

Modern data workflows are inherently adaptive, repeatedly querying the same dataset to refine and validate sequential decisions, but such adaptivity can lead to overfitting and invalid statistical inference. Adaptive Data Analysis (ADA)…

Machine Learning · Computer Science 2026-02-10 Joon Suk Huh

Parameter estimation is one of the most important tasks in statistics, and is key to helping people understand the distribution behind a sample of observations. Traditionally parameter estimation is done either by closed-form solutions…

Machine Learning · Computer Science 2024-03-04 Xiaoxin Yin , David S. Yin
‹ Prev 1 2 3 10 Next ›