English
Related papers

Related papers: A Provably Accurate Randomized Sampling Algorithm …

200 papers

One popular method for dealing with large-scale data sets is sampling. For example, by using the empirical statistical leverage scores as an importance sampling distribution, the method of algorithmic leveraging samples and rescales…

Methodology · Statistics 2013-06-25 Ping Ma , Michael W. Mahoney , Bin Yu

For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where…

Computation · Statistics 2019-06-27 HaiYing Wang , Rong Zhu , Ping Ma

Logistic regression models are a popular and effective method to predict the probability of categorical response data. However inference for these models can become computationally prohibitive for large datasets. Here we adapt ideas from…

Methodology · Statistics 2020-08-25 Tom Whitaker , Boris Beranger , Scott A. Sisson

Random sampling has become a critical tool in solving massive matrix problems. For linear regression, a small, manageable set of data rows can be randomly selected to approximate a tall, skinny data matrix, improving processing time…

Data Structures and Algorithms · Computer Science 2014-08-22 Michael B. Cohen , Yin Tat Lee , Cameron Musco , Christopher Musco , Richard Peng , Aaron Sidford

Logistic regression is a fundamental and widely used statistical method for modeling binary outcomes based on covariates. However, the presence of missing data, particularly in settings involving hybrid covariates (a mix of discrete and…

Methodology · Statistics 2025-06-05 Mohamed Cherifi , Xujia Zhu , Mohammed Nabil El Korso , Ammar Mesloub

In complex survey data, each sampled observation has assigned a sampling weight, indicating the number of units that it represents in the population. Whether sampling weights should or not be considered in the estimation process of model…

Methodology · Statistics 2024-09-20 Amaia Iparragirre , Irantzu Barrio , Jorge Aramendi , Inmaculada Arostegui

Logistic regression is a well-known statistical model which is commonly used in the situation where the output is a binary random variable. It has a wide range of applications including machine learning, public health, social sciences,…

Statistics Theory · Mathematics 2019-04-18 Bernard Bercu , Antoine Godichon-Baggioni , Bruno Portier

Subsampling algorithms for various parametric regression models with massive data have been extensively investigated in recent years. However, all existing studies on subsampling heavily rely on clean massive data. In practical…

Statistics Theory · Mathematics 2025-06-11 Jiangshan Ju , Mingqiu Wang , Shengli Zhao

We introduce a statistical physics inspired supervised machine learning algorithm for classification and regression problems. The method is based on the invariances or stability of predicted results when known data is represented as…

Machine Learning · Statistics 2018-11-19 Patrick Chao , Tahereh Mazaheri , Bo Sun , Nicholas B. Weingartner , Zohar Nussinov

A major challenge for building statistical models in the big data era is that the available data volume far exceeds the computational capability. A common approach for solving this problem is to employ a subsampled dataset that can be…

Computation · Statistics 2018-09-14 Lei Han , Kean Ming Tan , Ting Yang , Tong Zhang

Randomized algorithms for very large matrix problems have received a great deal of attention in recent years. Much of this work was motivated by problems in large-scale data analysis, and this work was performed by individuals from many…

Data Structures and Algorithms · Computer Science 2011-11-16 Michael W. Mahoney

$L_1$ regularized logistic regression has now become a workhorse of data mining and bioinformatics: it is widely used for many classification problems, particularly ones with many features. However, $L_1$ regularization typically selects…

Machine Learning · Statistics 2015-02-12 Zhe Liu

In this paper, we propose improved estimation method for logistic regression based on subsamples taken according the optimal subsampling probabilities developed in Wang et al. 2018 Both asymptotic results and numerical results show that the…

Methodology · Statistics 2021-06-24 HaiYing Wang

A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…

Methodology · Statistics 2015-11-24 Rong Zhu , Ping Ma , Michael W. Mahoney , Bin Yu

We outline how modern likelihood theory, which provides essentially exact inferences in a variety of parametric statistical problems, may routinely be applied in practice. Although the likelihood procedures are based on analytical…

Methodology · Statistics 2009-06-23 Alessandra R. Brazzale , Anthony C. Davison

Leverage score sampling provides an appealing way to perform approximate computations for large matrices. Indeed, it allows to derive faithful approximations with a complexity adapted to the problem at hand. Yet, performing leverage scores…

Machine Learning · Statistics 2019-01-25 Alessandro Rudi , Daniele Calandriello , Luigi Carratino , Lorenzo Rosasco

Logistic models are studied as a tool to convert output from numerical weather forecasting systems (deterministic and ensemble) into probability forecasts for binary events. A logistic model obtains by putting the logarithmic odds ratio…

Atmospheric and Oceanic Physics · Physics 2009-01-29 Jochen Bröcker

We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate". To obtain…

Data Structures and Algorithms · Computer Science 2014-06-04 Dimitris Papailiopoulos , Anastasios Kyrillidis , Christos Boutsidis

Logistic regression is the most commonly used method for constructing predictive models for binary responses. One significant drawback to this approach, however, is that the asymptotes of the logistic response function are fixed at 0 and 1,…

Methodology · Statistics 2026-02-09 Anthony Almudevar , Jacob Almudevar

Case-control sampling is a commonly used retrospective sampling design to alleviate imbalanced structure of binary data. When fitting the logistic regression model with case-control data, although the slope parameter of the model can be…

Methodology · Statistics 2024-06-03 Hengchao Shi , Xinyi Liu , Ming Zheng , Wen Yu
‹ Prev 1 2 3 10 Next ›