Related papers: Computationally Efficient Estimators for Dimension…

Efficient l_{alpha} Distance Approximation for High Dimensional Data Using alpha-Stable Projection

In recent years, large high-dimensional data sets have become commonplace in a wide range of applications in science and commerce. Techniques for dimension reduction are of primary concern in statistical analysis. Projection methods play an…

Computation · Statistics 2008-01-24 Peter Clifford , Ioana A. Cosma

The Optimal Quantile Estimator for Compressed Counting

Compressed Counting (CC) was recently proposed for very efficiently computing the (approximate) $\alpha$th frequency moments of data streams, where $0<\alpha <= 2$. Several estimators were reported including the geometric mean estimator,…

Data Structures and Algorithms · Computer Science 2008-08-14 Ping Li

Binary and Multi-Bit Coding for Stable Random Projections

We develop efficient binary (i.e., 1-bit) and multi-bit coding schemes for estimating the scale parameter of $\alpha$-stable distributions. The work is motivated by the recent work on one scan 1-bit compressed sensing (sparse signal…

Methodology · Statistics 2016-02-02 Ping Li

Very Sparse Stable Random Projections, Estimators and Tail Bounds for Stable Random Projections

This paper will focus on three different aspects in improving the current practice of stable random projections. Firstly, we propose {\em very sparse stable random projections} to significantly reduce the processing and storage cost, by…

Data Structures and Algorithms · Computer Science 2007-07-13 Ping Li

Computationally Efficient Learning of Statistical Manifolds

Analyzing high-dimensional data with manifold learning algorithms often requires searching for the nearest neighbors of all observations. This presents a computational bottleneck in statistical manifold learning when observations of…

Machine Learning · Computer Science 2022-03-11 Fan Cheng , Anastasios Panagiotelis , Rob J Hyndman

Fast Computation of Robust Subspace Estimators

Dimension reduction is often an important step in the analysis of high-dimensional data. PCA is a popular technique to find the best low-dimensional approximation of high-dimensional data. However, classical PCA is very sensitive to…

Computation · Statistics 2019-01-14 Holger Cevallos-Valdiviezo , Stefan Van Aelst

Distributed estimation through parallel approximants

Designing scalable estimation algorithms is a core challenge in modern statistics. Here we introduce a framework to address this challenge based on parallel approximants, which yields estimators with provable properties that operate on the…

Methodology · Statistics 2023-08-04 Aritra Chakravorty , William S. Cleveland , Patrick J. Wolfe

Approximating Higher-Order Distances Using Random Projections

We provide a simple method and relevant theoretical analysis for efficiently estimating higher-order lp distances. While the analysis mainly focuses on l4, our methodology extends naturally to p = 6,8,10..., (i.e., when p is even).…

Machine Learning · Computer Science 2012-03-19 Ping Li , Michael W. Mahoney , Yiyuan She

Fast Estimation Method for the Stability of Ensemble Feature Selectors

It is preferred that feature selectors be \textit{stable} for better interpretabity and robust prediction. Ensembling is known to be effective for improving the stability of feature selectors. Since ensembling is time-consuming, it is…

Machine Learning · Computer Science 2021-08-04 Rina Onda , Zhengyan Gao , Masaaki Kotera , Kenta Oono

A Computationally Efficient Method for Learning Exponential Family Distributions

We consider the question of learning the natural parameters of a $k$ parameter minimal exponential family from i.i.d. samples in a computationally and statistically efficient manner. We focus on the setting where the support as well as the…

Machine Learning · Computer Science 2021-11-01 Abhin Shah , Devavrat Shah , Gregory W. Wornell

Sign Stable Random Projections for Large-Scale Learning

We study the use of "sign $\alpha$-stable random projections" (where $0<\alpha\leq 2$) for building basic data processing tools in the context of large-scale machine learning applications (e.g., classification, regression, clustering, and…

Machine Learning · Statistics 2015-04-29 Ping Li

L1-optimal linear programming estimatorfor periodic frontier functions with Holder continuous derivative

We propose a new estimator based on a linear programming method for smooth frontiers of sample points. The derivative of the frontier function is supposed to be Holder continuous.The estimator is defined as a linear combination of kernel…

Statistics Theory · Mathematics 2014-09-23 Alexander Nazin , Stephane Girard

Random Projection Estimation of Discrete-Choice Models with Large Choice Sets

We introduce sparse random projection, an important dimension-reduction tool from machine learning, for the estimation of discrete-choice models with high-dimensional choice sets. Initially, high-dimensional data are compressed into a…

Machine Learning · Statistics 2016-04-21 Khai X. Chiong , Matthew Shum

An efficient estimator for locally stationary Gaussian long-memory processes

This paper addresses the estimation of locally stationary long-range dependent processes, a methodology that allows the statistical analysis of time series data exhibiting both nonstationarity and strong dependency. A time-varying…

Statistics Theory · Mathematics 2010-11-12 Wilfredo Palma , Ricardo Olea

Optimal low-rank stochastic gradient estimation for LLM training

Large language model (LLM) training is often bottlenecked by memory constraints and stochastic gradient noise in extremely high-dimensional parameter spaces. Motivated by empirical evidence that many LLM gradient matrices are effectively…

Machine Learning · Computer Science 2026-03-24 Zehao Li , Tao Ren , Zishi Zhang , Xi Chen , Yijie Peng

Batch mode active learning for efficient parameter estimation

For many tasks of data analysis, we may only have the information of the explanatory variable and the evaluation of the response values are quite expensive. While it is impractical or too costly to obtain the responses of all units, a…

Computation · Statistics 2023-04-07 Wei Zheng , Ting Tian , Xueqin Wang

Stability and Accuracy Trade-offs in Statistical Estimation

Algorithmic stability is a central concept in statistics and learning theory that measures how sensitive an algorithm's output is to small changes in the training data. Stability plays a crucial role in understanding generalization,…

Statistics Theory · Mathematics 2026-01-21 Abhinav Chakraborty , Yuetian Luo , Rina Foygel Barber

Distance Queries from Sampled Data: Accurate and Efficient

Distance queries are a basic tool in data analysis. They are used for detection and localization of change for the purpose of anomaly detection, monitoring, or planning. Distance queries are particularly useful when data sets such as…

Data Structures and Algorithms · Computer Science 2015-03-20 Edith Cohen

Optimal subsampling for functional quantile regression

Subsampling is an efficient method to deal with massive data. In this paper, we investigate the optimal subsampling for linear quantile regression when the covariates are functions. The asymptotic distribution of the subsampling estimator…

Numerical Analysis · Mathematics 2022-05-06 Qian Yan , Hanyu Li , Chengmei Niu

Proximal Projection Method for Stable Linearly Constrained Optimization

Many applications using large datasets require efficient methods for minimizing a proximable convex function subject to satisfying a set of linear constraints within a specified tolerance. For this task, we present a proximal projection…

Optimization and Control · Mathematics 2024-12-10 Howard Heaton