Related papers: Equations of States in Singular Statistical Estima…
Many learning machines that have hierarchical structure or hidden variables are now being used in information science, artificial intelligence, and bioinformatics. However, several learning machines used in such fields are not regular but…
Hierarchical parametric models consisting of observable and latent variables are widely used for unsupervised learning tasks. For example, a mixture model is a representative hierarchical model for clustering. From the statistical point of…
Bayesian networks are now being used in enormous fields, for example, diagnosis of a system, data mining, clustering and so on. In spite of their wide range of applications, the statistical properties have not yet been clarified, because…
Various approaches have been developed to upper bound the generalization error of a supervised learning algorithm. However, existing bounds are often loose and even vacuous when evaluated in practice. As a result, they may fail to…
We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfer learning algorithms by focusing on two popular transfer learning approaches, $\alpha$-weighted-ERM and two-stage-ERM. Our key result is an…
Bounding the generalization error of a supervised learning algorithm is one of the most important problems in learning theory, and various approaches have been developed. However, existing bounds are often loose and lack of guarantees. As a…
Deep learning has seen substantial achievements, with numerical and theoretical evidence suggesting that singularities of statistical models are considered a contributing factor to its performance. From this remarkable success of classical…
We analyze the generalization ability of joint-training meta learning algorithms via the Gibbs algorithm. Our exact characterization of the expected meta generalization error for the meta Gibbs algorithm is based on symmetrized KL…
Multinomial mixtures are widely used in the information engineering field, however, their mathematical properties are not yet clarified because they are singular learning models. In fact, the models are non-identifiable and their Fisher…
Watanabe's singular learning theory provides a framework for asymptotic analysis of Bayesian model selection for statistical models with singularities, where traditional statistical regularity assumptions fail. Learning coefficients, also…
Hypothesis testing in singular statistical models is often regarded as inherently problematic due to non-identifiability and degeneracy of the Fisher information. We show that the fundamental obstruction to testing in such models is not…
Many learning machines such as normal mixtures and layered neural networks are not regular but singular statistical models, because the map from a parameter to a probability distribution is not one-to-one. The conventional statistical…
We consider approximate Bayesian model choice for model selection problems that involve models whose Fisher-information matrices may fail to be invertible along other competing submodels. Such singular models do not obey the regularity…
Virtually any model we use in machine learning to make predictions does not perfectly represent reality. So, most of the learning happens under model misspecification. In this work, we present a novel analysis of the generalization…
In this paper, the method of gaps, a technique for deriving closed-form expressions in terms of information measures for the generalization error of supervised machine learning algorithms is introduced. The method relies on the notion of…
Hierarchical statistical models are widely employed in information science and data engineering. The models consist of two types of variables: observable variables that represent the given data and latent variables for the unobservable…
Gibbs states are familiar from statistical mechanics, yet their use is not limited to that domain. For instance, they also feature in the maximum entropy reconstruction of quantum states from incomplete measurement data. Outside the…
Singularities of a statistical model are the elements of the model's parameter space which make the corresponding Fisher information matrix degenerate. These are the points for which estimation techniques such as the maximum likelihood…
Recent progress has shown that the generalization error of the Gibbs algorithm can be exactly characterized using the symmetrized KL information between the learned hypothesis and the entire training dataset. However, evaluating such a…
Machine learning algorithms are increasingly used to inform critical decisions. There is a growing concern about bias, that algorithms may produce uneven outcomes for individuals in different demographic groups. In this work, we measure…