Related papers: Computationally efficient univariate filtering for…

Reconciling common source, specific source, feature based and score based likelihood ratios

We show that the incorporation of any new piece of information allows for improved decision making in the sense that the expected costs of an optimal decision decrease (or, in boundary cases where no or not enough new information is…

Statistics Theory · Mathematics 2025-11-20 Aafko Boonstra , Ronald Meester , Klaas Slooten

Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies

Motivated by genome-wide association studies, we consider a standard linear model with one additional random effect in situations where many predictors have been collected on the same subjects and each predictor is analyzed separately.…

Applications · Statistics 2013-04-24 Matti Pirinen , Peter Donnelly , Chris C. A. Spencer

Computationally efficient permutation tests for the multivariate two-sample problem based on energy distance or maximum mean discrepancy statistics

Non-parametric two-sample tests based on energy distance or maximum mean discrepancy are widely used statistical tests for comparing multivariate data from two populations. While these tests enjoy desirable statistical properties, their…

Computation · Statistics 2024-06-11 Elias Chaibub Neto

LiRa: A New Likelihood-Based Similarity Score for Collaborative Filtering

Recommender system data presents unique challenges to the data mining, machine learning, and algorithms communities. The high missing data rate, in combination with the large scale and high dimensionality that is typical of recommender…

Information Retrieval · Computer Science 2017-03-22 Veronika Strnadova-Neeley , Aydin Buluc , John R. Gilbert , Leonid Oliker , Weimin Ouyang

Fast computation of p-values for the permutation test based on Pearson's correlation coefficient and other statistical tests

Permutation tests are among the simplest and most widely used statistical tools. Their p-values can be computed by a straightforward sampling of permutations. However, this way of computing p-values is often so slow that it is replaced by…

Computation · Statistics 2018-07-27 Jean-Marie Droz

A Robust Score-Driven Filter for Multivariate Time Series

A multivariate score-driven filter is developed to extract signals from noisy vector processes. By assuming that the conditional location vector from a multivariate Student's t distribution changes over time, we construct a robust filter…

Econometrics · Economics 2022-08-31 Enzo D'Innocenzo , Alessandra Luati , Mario Mazzocchi

Provable benefits of score matching

Score matching is an alternative to maximum likelihood (ML) for estimating a probability distribution parametrized up to a constant of proportionality. By fitting the ''score'' of the distribution, it sidesteps the need to compute this…

Machine Learning · Computer Science 2023-06-06 Chirag Pabbaraju , Dhruv Rohatgi , Anish Sevekari , Holden Lee , Ankur Moitra , Andrej Risteski

Measuring the accuracy of likelihood-free inference

Complex scientific models where the likelihood cannot be evaluated present a challenge for statistical inference. Over the past two decades, a wide range of algorithms have been proposed for learning parameters in computationally feasible…

Computation · Statistics 2021-12-16 Aden Forrow , Ruth E. Baker

Optimal Cross-Validation for Sparse Linear Regression

Given a high-dimensional covariate matrix and a response vector, ridge-regularized sparse linear regression selects a subset of features that explains the relationship between covariates and the response in an interpretable manner. To…

Optimization and Control · Mathematics 2026-02-13 Ryan Cory-Wright , Andrés Gómez

Extremely efficient permutation and bootstrap hypothesis tests using R

Re-sampling based statistical tests are known to be computationally heavy, but reliable when small sample sizes are available. Despite their nice theoretical properties not much effort has been put to make them efficient. In this paper we…

Methodology · Statistics 2018-06-29 Christina Chatzipantsiou , Marios Dimitriadis , Manos Papadakis , Michail Tsagris

Cross-Leverage Scores for Selecting Subsets of Explanatory Variables

In a standard regression problem, we have a set of explanatory variables whose effect on some response vector is modeled. For wide binary data, such as genetic marker data, we often have two limitations. First, we have more parameters than…

Methodology · Statistics 2021-09-20 Katharina Parry , Leo N. Geppert , Alexander Munteanu , Katja Ickstadt

Optimal P-value Weighting with Independent Information

The large-scale multiple testing inherent to high throughput biological data necessitates very high statistical stringency and thus true effects in data are difficult to detect unless they have high effect sizes. One solution to this…

Methodology · Statistics 2017-12-21 Mohamad S. Hasan

Significance-Based Categorical Data Clustering

Although numerous algorithms have been proposed to solve the categorical data clustering problem, how to access the statistical significance of a set of categorical clusters remains unaddressed. To fulfill this void, we employ the…

Machine Learning · Computer Science 2022-11-09 Lianyu Hu , Mudi Jiang , Yan Liu , Zengyou He

Fit Like You Sample: Sample-Efficient Generalized Score Matching from Fast Mixing Diffusions

Score matching is an approach to learning probability distributions parametrized up to a constant of proportionality (e.g. Energy-Based Models). The idea is to fit the score of the distribution, rather than the likelihood, thus avoiding the…

Machine Learning · Computer Science 2024-01-31 Yilong Qin , Andrej Risteski

Multivariate Comparison of Classification Algorithms

Statistical tests that compare classification algorithms are univariate and use a single performance measure, e.g., misclassification error, $F$ measure, AUC, and so on. In multivariate tests, comparison is done using multiple measures…

Machine Learning · Statistics 2014-09-17 Olcay Taner Yildiz , Ethem Alpaydin

Precise Error Rates for Computationally Efficient Testing

We revisit the fundamental question of simple-versus-simple hypothesis testing with an eye towards computational complexity, as the statistically optimal likelihood ratio test is often computationally intractable in high-dimensional…

Statistics Theory · Mathematics 2025-05-05 Ankur Moitra , Alexander S. Wein

Computing log-likelihood and its derivatives for restricted maximum likelihood methods

Recent large scale genome wide association analysis involves large scale linear mixed models. Quantifying (co)-variance parameters in the mixed models with a restricted maximum likelihood method results in a score function which is the…

Numerical Analysis · Mathematics 2016-08-26 Shengxin Zhu

An Efficient Data Analysis Method for Big Data using Multiple-Model Linear Regression

This paper introduces a new data analysis method for big data using a newly defined regression model named multiple model linear regression(MMLR), which separates input datasets into subsets and construct local linear regression models of…

Machine Learning · Computer Science 2023-08-25 Bohan Lyu , Jianzhong Li

Predictive Data Calibration for Linear Correlation Significance Testing

Inferring linear relationships lies at the heart of many empirical investigations. A measure of linear dependence should correctly evaluate the strength of the relationship as well as qualify whether it is meaningful for the population.…

Methodology · Statistics 2022-08-16 Kaustubh R. Patil , Simon B. Eickhoff , Robert Langner

Calculation of forensic likelihood ratios: Use of Monte Carlo simulations to compare the output of score-based approaches with true likelihood-ratio values

A group of approaches for calculating forensic likelihood ratios first calculates scores which quantify the degree of difference or the degree of similarity between pairs of samples, then converts those scores to likelihood ratios. In order…

Applications · Statistics 2016-12-28 Geoffrey Stewart Morrison