Related papers: Learning, complexity and information density
We investigate a population of binary mistake sequences that result from learning with parametric models of different order. We obtain estimates of their error, algorithmic complexity and divergence from a purely random Bernoulli sequence.…
Given a random binary sequence $X^{(n)}$ of random variables, $X_{t},$ $t=1,2,...,n$, for instance, one that is generated by a Markov source (teacher) of order $k^{*}$ (each state represented by $k^{*}$ bits). Assume that the probability of…
We review possible measures of complexity which might in particular be applicable to situations where the complexity seems to arise spontaneously. We point out that not all of them correspond to the intuitive (or "naive") notion, and that…
Since human randomness production has been studied and widely used to assess executive functions (especially inhibition), many measures have been suggested to assess the degree to which a sequence is random-like. However, each of them…
Dropout Regularization, serving to reduce variance, is nearly ubiquitous in Deep Learning models. We explore the relationship between the dropout rate and model complexity by training 2,000 neural networks configured with random…
Evaluating the performance of classifiers is critical in machine learning, particularly in high-stakes applications where the reliability of predictions can significantly impact decision-making. Traditional performance measures, such as…
The randomness rate of an infinite binary sequence is characterized by the sequence of ratios between the Kolmogorov complexity and the length of the initial segments of the sequence. It is known that there is no uniform effective procedure…
Complexity is a multi-faceted phenomenon, involving a variety of features including disorder, nonlinearity, and self-organisation. We use a recently developed rigorous framework for complexity to understand measures of complexity. We…
A fundamental algorithm for selecting ranks from a finite subset of an ordered set is Radix Selection. This algorithm requires the data to be given as strings of symbols over an ordered alphabet, e.g., binary expansions of real numbers. Its…
We study the relationship between catastrophic forgetting and properties of task sequences. In particular, given a sequence of tasks, we would like to understand which properties of this sequence influence the error rates of continual…
We consider the sample complexity of learning with adversarial robustness. Most prior theoretical results for this problem have considered a setting where different classes in the data are close together or overlapping. Motivated by some…
Stability is a central property in learning and statistics promising the output of an algorithm $A$ does not change substantially when applied to similar datasets $S$ and $S'$. It is an elementary fact that any sufficiently stable algorithm…
Plotting a learner's average performance against the number of training samples results in a learning curve. Studying such curves on one or more data sets is a way to get to a better understanding of the generalization properties of this…
A supervised learning algorithm has access to a distribution of labeled examples, and needs to return a function (hypothesis) that correctly labels the examples. The hypothesis of the learner is taken from some fixed class of functions…
The nature of concept learning is a core question in cognitive science. Theories must account for the relative difficulty of acquiring different concepts by supervised learners. For a canonical set of six category types, two distinct…
Correlation measure of order $k$ is an important measure of randomness in binary sequences. This measure tries to look for dependence between several shifted version of a sequence. We study the relation between the correlation measure of…
This paper presents a novel information-theoretic perspective on generalization in machine learning by framing the learning problem within the context of lossy compression and applying finite blocklength analysis. In our approach, the…
Defect prediction is crucial for software quality assurance and has been extensively researched over recent decades. However, prior studies rarely focus on data complexity in defect prediction tasks, and even less on understanding the…
Learning to hash is an efficient paradigm for exact and approximate nearest neighbor search from massive databases. Binary hash codes are typically extracted from an image by rounding output features from a CNN, which is trained on a…
In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO). We define memorization via the information a learning algorithm reveals about its training data…