Related papers: Label Smoothing Improves Neural Source Code Summar…
Overconfidence has been shown to impair generalization and calibration of a neural network. Previous studies remedy this issue by adding a regularization term to a loss function, preventing a model from making a peaked distribution. Label…
Regularization is an effective way to promote the generalization performance of machine learning models. In this paper, we focus on label smoothing, a form of output distribution regularization that prevents overfitting of a neural network…
It has been hypothesized that label smoothing can reduce overfitting and improve generalization, and current empirical evidence seems to corroborate these effects. However, there is a lack of mathematical understanding of when and why such…
The generalization and learning speed of a multi-class neural network can often be significantly improved by using soft targets that are a weighted average of the hard targets and the uniform distribution over labels. Smoothing the labels…
Training neural networks with one-hot target labels often results in overconfidence and overfitting. Label smoothing addresses this issue by perturbing the one-hot target labels by adding a uniform probability vector to create a regularized…
Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising. However, extending such methods directly to seq2seq settings, such as Machine Translation,…
Generating confidence calibrated outputs is of utmost importance for the applications of deep neural networks in safety-critical decision-making systems. The output of a neural network is a probability distribution where the scores are…
Label smoothing (LS) is an arising learning paradigm that uses the positively weighted average of both the hard training labels and uniformly distributed soft labels. It was shown that LS serves as a regularizer for training data with hard…
Prior work has explored directly regularizing the output distributions of probabilistic models to alleviate peaky (i.e. over-confident) predictions, a common sign of overfitting. This class of techniques, of which label smoothing is one,…
Label smoothing is widely used in deep neural networks for multi-class classification. While it enhances model generalization and reduces overconfidence by aiming to lower the probability for the predicted class, it distorts the predicted…
Label smoothing and vocabulary sharing are two widely used techniques in neural machine translation models. However, we argue that simply applying both techniques can be conflicting and even leads to sub-optimal performance. When allocating…
Label Smoothing (LS) is an effective regularizer to improve the generalization of state-of-the-art deep models. For each training sample the LS strategy smooths the one-hot encoded training signal by distributing its distribution mass over…
Label smoothing is an effective regularization tool for deep neural networks (DNNs), which generates soft labels by applying a weighted average between the uniform distribution and the hard label. It is often used to reduce the overfitting…
Label smoothing is a widely studied regularization technique in machine learning. However, its potential for node classification in graph-structured data, spanning homophilic to heterophilic graphs, remains largely unexplored. We introduce…
Label smoothing (LS) is a popular regularisation method for training neural networks as it is effective in improving test accuracy and is simple to implement. ``Hard'' one-hot labels are ``smoothed'' by uniformly distributing probability…
Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors. Empirically, smoothing has been shown to improve both predictive performance and model calibration. In…
Label smoothing is ubiquitously applied in Neural Machine Translation (NMT) training. While label smoothing offers a desired regularization effect during model training, in this paper we demonstrate that it nevertheless introduces length…
Regularization techniques are crucial to improving the generalization performance and training efficiency of deep neural networks. Many deep learning algorithms rely on weight decay, dropout, batch/layer normalization to converge faster and…
Training modern neural networks is an inherently noisy process that can lead to high \emph{prediction churn} -- disagreements between re-trainings of the same model due to factors such as randomization in the parameter initialization and…
Label smoothing loss is a widely adopted technique to mitigate overfitting in deep neural networks. This paper studies label smoothing from the perspective of Neural Collapse (NC), a powerful empirical and theoretical framework which…