Related papers: TURF: A Two-factor, Universal, Robust, Fast Distri…

SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm

Sample- and computationally-efficient distribution estimation is a fundamental tenet in statistics and machine learning. We present SURF, an algorithm for approximating distributions by piecewise polynomials. SURF is: simple, replacing…

Machine Learning · Statistics 2021-02-15 Yi Hao , Ayush Jain , Alon Orlitsky , Vaishakh Ravindrakumar

Sample-Optimal Density Estimation in Nearly-Linear Time

We design a new, fast algorithm for agnostically learning univariate probability distributions whose densities are well approximated by piecewise polynomial functions. Let $f$ be the density function of an arbitrary univariate distribution,…

Data Structures and Algorithms · Computer Science 2015-06-03 Jayadev Acharya , Ilias Diakonikolas , Jerry Li , Ludwig Schmidt

Efficient Robust Proper Learning of Log-concave Distributions

We study the {\em robust proper learning} of univariate log-concave distributions (over continuous and discrete domains). Given a set of samples drawn from an unknown target distribution, we want to compute a log-concave hypothesis…

Data Structures and Algorithms · Computer Science 2016-06-10 Ilias Diakonikolas , Daniel M. Kane , Alistair Stewart

Constant-Factor Approximation for the Uniform Decision Tree

We resolve a long-standing open question, about the existence of a constant-factor approximation algorithm for the average-case \textsc{Decision Tree} problem with uniform probability distribution over the hypotheses. We answer the question…

Data Structures and Algorithms · Computer Science 2026-04-29 Michał Szyfelbein

Efficient Density Estimation via Piecewise Polynomial Approximation

We give a highly efficient "semi-agnostic" algorithm for learning univariate probability distributions that are well approximated by piecewise polynomial density functions. Let $p$ be an arbitrary distribution over an interval $I$ which is…

Machine Learning · Computer Science 2013-05-15 Siu-On Chan , Ilias Diakonikolas , Rocco A. Servedio , Xiaorui Sun

Efficient deterministic approximate counting for low-degree polynomial threshold functions

We give a deterministic algorithm for approximately counting satisfying assignments of a degree-$d$ polynomial threshold function (PTF). Given a degree-$d$ input polynomial $p(x_1,\dots,x_n)$ over $R^n$ and a parameter $\epsilon> 0$, our…

Computational Complexity · Computer Science 2013-12-02 Anindya De , Rocco Servedio

Learning $k$-Modal Distributions via Testing

A $k$-modal probability distribution over the discrete domain $\{1,...,n\}$ is one whose histogram has at most $k$ "peaks" and "valleys." Such distributions are natural generalizations of monotone ($k=0$) and unimodal ($k=1$) probability…

Data Structures and Algorithms · Computer Science 2014-09-16 Constantinos Daskalakis , Ilias Diakonikolas , Rocco A. Servedio

Fast Distributed Approximation for Max-Cut

Finding a maximum cut is a fundamental task in many computational settings. Surprisingly, it has been insufficiently studied in the classic distributed settings, where vertices communicate by synchronously sending messages to their…

Data Structures and Algorithms · Computer Science 2017-07-27 Keren Censor-Hillel , Rina Levy , Hadas Shachnai

Computationally Efficient Learning of Statistical Manifolds

Analyzing high-dimensional data with manifold learning algorithms often requires searching for the nearest neighbors of all observations. This presents a computational bottleneck in statistical manifold learning when observations of…

Machine Learning · Computer Science 2022-03-11 Fan Cheng , Anastasios Panagiotelis , Rob J Hyndman

Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms

Let $p$ be an unknown and arbitrary probability distribution over $[0,1)$. We consider the problem of {\em density estimation}, in which a learning algorithm is given i.i.d. draws from $p$ and must (with high probability) output a…

Machine Learning · Computer Science 2014-11-04 Siu-On Chan , Ilias Diakonikolas , Rocco A. Servedio , Xiaorui Sun

Optimal Algorithms and Lower Bounds for Testing Closeness of Structured Distributions

We give a general unified method that can be used for $L_1$ {\em closeness testing} of a wide range of univariate structured distribution families. More specifically, we design a sample optimal and computationally efficient algorithm for…

Data Structures and Algorithms · Computer Science 2015-08-25 Ilias Diakonikolas , Daniel M. Kane , Vladimir Nikishkin

Efficient Distance Approximation for Structured High-Dimensional Distributions via Learning

We design efficient distance approximation algorithms for several classes of structured high-dimensional distributions. Specifically, we show algorithms for the following problems: - Given sample access to two Bayesian networks $P_1$ and…

Data Structures and Algorithms · Computer Science 2020-02-17 Arnab Bhattacharyya , Sutanu Gayen , Kuldeep S. Meel , N. V. Vinodchandran

How fast can you find a good hypothesis?

In the hypothesis selection problem, we are given sample and query access to finite set of candidate distributions (hypotheses), $\mathcal{H} = \{H_1, \ldots, H_n\}$, and samples from an unknown distribution $P$, both over a domain…

Data Structures and Algorithms · Computer Science 2025-11-12 Anders Aamand , Maryam Aliakbarpour , Justin Y. Chen , Sandeep Silwal

Tight Approximation Bounds for the Seminar Assignment Problem

The seminar assignment problem is a variant of the generalized assignment problem in which items have unit size and the amount of space allowed in each bin is restricted to an arbitrary set of values. The problem has been shown to be…

Data Structures and Algorithms · Computer Science 2016-10-18 Amotz Bar-Noy , George Rabanca

The Optimal Approximation Factor in Density Estimation

Consider the following problem: given two arbitrary densities $q_1,q_2$ and a sample-access to an unknown target density $p$, find which of the $q_i$'s is closer to $p$ in total variation. A remarkable result due to Yatracos shows that this…

Machine Learning · Computer Science 2025-12-16 Olivier Bousquet , Daniel Kane , Shay Moran

Efficient Discrepancy Testing for Learning with Distribution Shift

A fundamental notion of distance between train and test distributions from the field of domain adaptation is discrepancy distance. While in general hard to compute, here we provide the first set of provably efficient algorithms for testing…

Data Structures and Algorithms · Computer Science 2024-06-14 Gautam Chandrasekaran , Adam R. Klivans , Vasilis Kontonis , Konstantinos Stavropoulos , Arsen Vasilyan

Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms

We study the problem of robustly learning multi-dimensional histograms. A $d$-dimensional function $h: D \rightarrow \mathbb{R}$ is called a $k$-histogram if there exists a partition of the domain $D \subseteq \mathbb{R}^d$ into $k$…

Machine Learning · Computer Science 2018-02-26 Ilias Diakonikolas , Jerry Li , Ludwig Schmidt

Maximizing Social Influence in Nearly Optimal Time

Diffusion is a fundamental graph process, underpinning such phenomena as epidemic disease contagion and the spread of innovation by word-of-mouth. We address the algorithmic problem of finding a set of k initial seed nodes in a network so…

Data Structures and Algorithms · Computer Science 2016-06-23 Christian Borgs , Michael Brautbar , Jennifer Chayes , Brendan Lucier

Optimal Transport: Fast Probabilistic Approximation with Exact Solvers

We propose a simple subsampling scheme for fast randomized approximate computation of optimal transport distances. This scheme operates on a random subset of the full data and can use any exact algorithm as a black-box back-end, including…

Computation · Statistics 2020-12-17 Max Sommerfeld , Jörn Schrieber , Yoav Zemel , Axel Munk

The complexity of learning halfspaces using generalized linear methods

Many popular learning algorithms (E.g. Regression, Fourier-Transform based algorithms, Kernel SVM and Kernel ridge regression) operate by reducing the problem to a convex optimization problem over a vector space of functions. These methods…

Machine Learning · Computer Science 2014-05-13 Amit Daniely , Nati Linial , Shai Shalev-Shwartz