Related papers: Linear regression through PAC-Bayesian truncation
We consider the problem of predicting as well as the best linear combination of d given functions in least squares regression, and variants of this problem including constraints on the parameters of the linear combination. When the input…
We consider the problem of robustly predicting as well as the best linear combination of $d$ given functions in least squares regression, and variants of this problem including constraints on the parameters of the linear combination. For…
Empirically, the PAC-Bayesian analysis is known to produce tight risk bounds for practical machine learning algorithms. However, in its naive form, it can only deal with stochastic predictors while such predictors are rarely used and…
We present a new PAC-Bayesian generalization bound. Standard bounds contain a $\sqrt{L_n \cdot \KL/n}$ complexity term which dominates unless $L_n$, the empirical error of the learning algorithm's randomized predictions, vanishes. We manage…
This paper studies the truncation method from Alquier [1] to derive high-probability PAC-Bayes bounds for unbounded losses with heavy tails. Assuming that the $p$-th moment is bounded, the resulting bounds interpolate between a slow rate $1…
We apply the PAC-Bayes theory to the setting of learning-to-optimize. To the best of our knowledge, we present the first framework to learn optimization algorithms with provable generalization guarantees (PAC-bounds) and explicit trade-off…
We study the problem of aggregation under the squared loss in the model of regression with deterministic design. We obtain sharp PAC-Bayesian risk bounds for aggregates defined via exponential weights, under general assumptions on the…
We use the PAC-Bayesian theory for the setting of learning-to-optimize. To the best of our knowledge, we present the first framework to learn optimization algorithms with provable generalization guarantees (PAC-Bayesian bounds) and explicit…
In this paper, we improve the PAC-Bayesian error bound for linear regression derived in Germain et al. [10]. The improvements are twofold. First, the proposed error bound is tighter, and converges to the generalization loss with a…
PAC-Bayesian bounds have proven to be a valuable tool for deriving generalization bounds and for designing new learning algorithms in machine learning. However, it typically focus on providing generalization bounds with respect to a chosen…
In truncated linear regression, samples $(x,y)$ are shown only when the outcome $y$ falls inside a certain survival set $S^\star$ and the goal is to estimate the unknown $d$-dimensional regressor $w^\star$. This problem has a long history…
PAC-Bayesian is an analysis framework where the training error can be expressed as the weighted average of the hypotheses in the posterior distribution whilst incorporating the prior knowledge. In addition to being a pure generalization…
The topics dicussed in this paper take their origin inthe estimation of the Gram matrix of a random vector from a sample made of n independent copies. They comprise the estimation of the covariance matrix and the study of least squares…
Motivated by the increasing use of and rapid changes in array technologies, we consider the prediction problem of fitting a linear regression relating a continuous outcome $Y$ to a large number of covariates $\mathbf {X}$, for example,…
Variational approximation techniques and inference for stochastic models in machine learning has gained much attention the last years. Especially in the case of Gaussian Processes (GP) and their deep versions, Deep Gaussian Processes…
We study an approach to learning pruning masks by optimizing the expected loss of stochastic pruning masks, i.e., masks which zero out each weight independently with some weight-specific probability. We analyze the training dynamics of the…
We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution…
We study the generalization error of randomized learning algorithms -- focusing on stochastic gradient descent (SGD) -- using a novel combination of PAC-Bayes and algorithmic stability. Importantly, our generalization bounds hold for all…
Considering a probability distribution over parameters is known as an efficient strategy to learn a neural network with non-differentiable activation functions. We study the expectation of a probabilistic neural network as a predictor by…
In Bayesian regression models with categorical predictors, constraints are needed to ensure identifiability when using all $K$ levels of a factor. The sum-to-zero constraint is particularly useful as it allows coefficients to represent…