Related papers: Extractor-Based Time-Space Lower Bounds for Learni…
In this work, we show, for the well-studied problem of learning parity under noise, where a learner tries to learn $x=(x_1,\ldots,x_n) \in \{0,1\}^n$ from a stream of random linear equations over $\mathrm{F}_2$ that are correct with…
We prove that any algorithm for learning parities requires either a memory of quadratic size or an exponential number of samples. This proves a recent conjecture of Steinhardt, Valiant and Wager and shows that for some learning problems a…
We consider the problem of performing linear regression over a stream of $d$-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without…
We give lower bounds on the amount of memory required by one-pass streaming algorithms for solving several natural learning problems. In a setting where examples lie in $\{0,1\}^d$ and the optimal classifier can be encoded using $\kappa$…
In his breakthrough paper, Raz showed that any parity learning algorithm requires either quadratic memory or an exponential number of samples [FOCS'16, JACM'19]. A line of work that followed extended this result to a large class of learning…
We develop an extension of recently developed methods for obtaining time-space tradeoff lower bounds for problems of learning from random test samples to handle the situation where the space of tests is signficantly smaller than the space…
Recent research demonstrated that training large language models involves memorization of a significant fraction of training data. Such memorization can lead to privacy violations when training on sensitive user data and thus motivates the…
Probably the most important problem in machine learning is the preliminary biasing of a learner's hypothesis space so that it is small enough to ensure good generalisation from reasonable training sets, yet large enough that it contains a…
The fundamental theorem of statistical learning states that for binary classification problems, any Empirical Risk Minimization (ERM) learning rule has close to optimal sample complexity. In this paper we seek for a generic optimal learner…
In a work by Raz (J. ACM and FOCS 16), it was proved that any algorithm for parity learning on $n$ bits requires either $\Omega(n^2)$ bits of classical memory or an exponential number (in~$n$) of random samples. A line of recent works…
Both the human brain and artificial learning agents operating in real-world or comparably complex environments are faced with the challenge of online model selection. In principle this challenge can be overcome: hierarchical Bayesian…
We make progress on two important problems regarding attribute efficient learnability. First, we give an algorithm for learning decision lists of length $k$ over $n$ variables using $2^{\tilde{O}(k^{1/3})} \log n$ examples and time…
Many latent-variable applications, including community detection, collaborative filtering, genomic analysis, and NLP, model data as generated by low-rank matrices. Yet despite considerable research, except for very special cases, the number…
This paper studies the error metric selection for long-term memory learning in sequence modelling. We examine the bias towards short-term memory in commonly used errors, including mean absolute/squared error. Our findings show that all…
In this paper the problem of {\em learning} appropriate domain-specific bias is addressed. It is shown that this can be achieved by learning many related tasks from the same domain, and a theorem is given bounding the number tasks that must…
Machine learning models have traditionally been developed under the assumption that the training and test distributions match exactly. However, recent success in few-shot learning and related problems are encouraging signs that these models…
A well-recognized limitation of kernel learning is the requirement to handle a kernel matrix, whose size is quadratic in the number of training examples. Many methods have been proposed to reduce this computational cost, mostly by using a…
Two widely-used computational paradigms for sublinear algorithms are using linear measurements to perform computations on a high dimensional input and using structured queries to access a massive input. Typically, algorithms in the former…
We study non-convex empirical risk minimization for learning halfspaces and neural networks. For loss functions that are $L$-Lipschitz continuous, we present algorithms to learn halfspaces and multi-layer neural networks that achieve…
We consider the problem of neural association for a network of non-binary neurons. Here, the task is to first memorize a set of patterns using a network of neurons whose states assume values from a finite number of integer levels. Later,…