Related papers: Learning-based Support Estimation in Sublinear Tim…

On Fine-Grained Distinct Element Estimation

We study the problem of distributed distinct element estimation, where $\alpha$ servers each receive a subset of a universe $[n]$ and aim to compute a $(1+\varepsilon)$-approximation to the number of distinct elements using minimal…

Data Structures and Algorithms · Computer Science 2025-07-01 Ilias Diakonikolas , Daniel M. Kane , Jasper C. H. Lee , Thanasis Pittas , David P. Woodruff , Samson Zhou

Improved Sublinear-time Moment Estimation using Weighted Sampling

In this work we study the {\it moment estimation} problem using weighted sampling. Given sample access to a set $A$ with $n$ weighted elements, and a parameter $t>0$, we estimate the $t$-th moment of $A$ given as $S_t=\sum_{a\in A} w(a)^t$.…

Data Structures and Algorithms · Computer Science 2025-02-24 Anup Bhattacharya , Pinki Pradhan

The Stochastic Replica Approach to Machine Learning: Stability and Parameter Optimization

We introduce a statistical physics inspired supervised machine learning algorithm for classification and regression problems. The method is based on the invariances or stability of predicted results when known data is represented as…

Machine Learning · Statistics 2018-11-19 Patrick Chao , Tahereh Mazaheri , Bo Sun , Nicholas B. Weingartner , Zohar Nussinov

Computationally efficient change point detection for high-dimensional regression

Large-scale sequential data is often exposed to some degree of inhomogeneity in the form of sudden changes in the parameters of the data-generating process. We consider the problem of detecting such structural changes in a high-dimensional…

Methodology · Statistics 2016-01-15 Florencia Leonardi , Peter Bühlmann

Improved Frequency Estimation Algorithms with and without Predictions

Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically…

Data Structures and Algorithms · Computer Science 2023-12-13 Anders Aamand , Justin Y. Chen , Huy Lê Nguyen , Sandeep Silwal , Ali Vakilian

Subsampling for Big Data Linear Models with Measurement Errors

Subsampling algorithms for various parametric regression models with massive data have been extensively investigated in recent years. However, all existing studies on subsampling heavily rely on clean massive data. In practical…

Statistics Theory · Mathematics 2025-06-11 Jiangshan Ju , Mingqiu Wang , Shengli Zhao

Sublinear Time Approximate Sum via Uniform Random Sampling

We investigate the approximation for computing the sum $a_1+...+a_n$ with an input of a list of nonnegative elements $a_1,..., a_n$. If all elements are in the range $[0,1]$, there is a randomized algorithm that can compute an…

Data Structures and Algorithms · Computer Science 2012-03-01 Bin Fu , Wenfeng Li , Zhiyong Peng

Distributed estimation of principal support vector machines for sufficient dimension reduction

The principal support vector machines method (Li et al., 2011) is a powerful tool for sufficient dimension reduction that replaces original predictors with their low-dimensional linear combinations without loss of information. However, the…

Machine Learning · Statistics 2019-12-02 Jun Jin , Chao Ying , Zhou Yu

Support Testing in the Huge Object Model

The Huge Object model is a distribution testing model in which we are given access to independent samples from an unknown distribution over the set of strings $\{0,1\}^n$, but are only allowed to query a few bits from the samples. We…

Data Structures and Algorithms · Computer Science 2024-09-18 Tomer Adar , Eldar Fischer , Amit Levi

Fast Multilevel Support Vector Machines

Solving different types of optimization models (including parameters fitting) for support vector machines on large-scale training data is often an expensive computational task. This paper proposes a multilevel algorithmic framework that…

Machine Learning · Statistics 2014-10-14 Talayeh Razzaghi , Ilya Safro

Resampling: an improvement of Importance Sampling in varying population size models

Sequential importance sampling algorithms have been defined to estimate likelihoods in models of ancestral population processes. However, these algorithms are based on features of the models with constant population size, and become…

Statistics Theory · Mathematics 2016-03-24 Coralie Merle , Raphaël Leblois , François Rousset , Pierre Pudlo

Accelerating Machine Learning Algorithms with Adaptive Sampling

The era of huge data necessitates highly efficient machine learning algorithms. Many common machine learning algorithms, however, rely on computationally intensive subroutines that are prohibitively expensive on large datasets. Oftentimes,…

Machine Learning · Computer Science 2023-09-26 Mo Tiwari

Nearly-Linear Time Private Hypothesis Selection with the Optimal Approximation Factor

Estimating the density of a distribution from its samples is a fundamental problem in statistics. Hypothesis selection addresses the setting where, in addition to a sample set, we are given $n$ candidate distributions -- referred to as…

Data Structures and Algorithms · Computer Science 2025-10-23 Maryam Aliakbarpour , Zhan Shi , Ria Stevens , Vincent X. Wang

Optimal Subsampling Approaches for Large Sample Linear Regression

A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…

Methodology · Statistics 2015-11-24 Rong Zhu , Ping Ma , Michael W. Mahoney , Bin Yu

Learning-Augmented Moment Estimation on Time-Decay Models

Motivated by the prevalence and success of machine learning, a line of recent work has studied learning-augmented algorithms in the streaming model. These results have shown that for natural and practical oracles implemented with machine…

Data Structures and Algorithms · Computer Science 2026-03-04 Soham Nagawanshi , Shalini Panthangi , Chen Wang , David P. Woodruff , Samson Zhou

Auxiliary Learning and its Statistical Understanding

Modern statistical analysis often encounters high-dimensional problems but with a limited sample size. It poses great challenges to traditional statistical estimation methods. In this work, we adopt auxiliary learning to solve the…

Statistics Theory · Mathematics 2025-01-08 Hanchao Yan , Feifei Wang , Chuanxin Xia , Hansheng Wang

Stochastic Learning of Semiparametric Monotone Index Models with Large Sample Size

I study the estimation of semiparametric monotone index models in the scenario where the number of observation points $n$ is extremely large and conventional approaches fail to work due to heavy computational burdens. Motivated by the…

Econometrics · Economics 2023-10-31 Qingsong Yao

Approximate Stochastic Subgradient Estimation Training for Support Vector Machines

Subgradient algorithms for training support vector machines have been quite successful for solving large-scale and online learning problems. However, they have been restricted to linear kernels and strongly convex formulations. This paper…

Machine Learning · Computer Science 2011-11-04 Sangkyun Lee , Stephen J. Wright

Starting Small -- Learning with Adaptive Sample Sizes

For many machine learning problems, data is abundant and it may be prohibitive to make multiple passes through the full training set. In this context, we investigate strategies for dynamically increasing the effective sample size, when…

Machine Learning · Computer Science 2016-10-10 Hadi Daneshmand , Aurelien Lucchi , Thomas Hofmann

Improving Causal Effect Estimation of Weighted RegressionBased Estimator using Neural Networks

Estimating causal effects from observational data informs us about which factors are important in an autonomous system, and enables us to take better decisions. This is important because it has applications in selecting a treatment in…

Machine Learning · Computer Science 2021-10-29 Plabon Shaha , Talha Islam Zadid , Ismat Rahman , Md. Mosaddek Khan