Related papers: Estimating the number of classes
Point estimation of class prevalences in the presence of data set shift has been a popular research topic for more than two decades. Less attention has been paid to the construction of confidence and prediction intervals for estimates of…
We wish to estimate the total number of classes in a population based on sample counts, especially in the presence of high latent diversity. Drawing on probability theory that characterizes distributions on the integers by ratios of…
Estimating the size of an elusive target population is of prominent interest in many areas in the life and social sciences. Our aim is to provide an efficient and workable method to estimate the unknown population size, given the frequency…
In cases of uncertainty, a multi-class classifier preferably returns a set of candidate classes instead of predicting a single class label with little guarantee. More precisely, the classifier should strive for an optimal balance between…
We investigate a Poisson sampling design in the presence of unknown selection probabilities when applied to a population of unknown size for multiple sampling occasions. The fixed-population model is adopted and extended upon for inference.…
We study the frequentist properties of Bayesian statistical inference for the stochastic block model, with an unknown number of classes of varying sizes. We equip the space of vertex labellings with a prior on the number of classes and,…
Probabilities in the multiverse can be calculated by assuming that we are typical representatives in a given reference class. But is this class well defined? What should be included in the ensemble in which we are supposed to be typical?…
When the cost of misclassifying a sample is high, it is useful to have an accurate estimate of uncertainty in the prediction for that sample. There are also multiple types of uncertainty which are best estimated in different ways, for…
The correct use and interpretation of models depends on several steps, two of which being the calibration by parameter estimation and the analysis of uncertainty. In the biological literature, these steps are seldom discussed together, but…
Estimating prevalence, the fraction of a population with a certain medical condition, is fundamental to epidemiology. Traditional methods rely on classification of test samples taken at random from a population. Such approaches to…
Probabilistic classifiers output a probability distribution on target classes rather than just a class prediction. Besides providing a clear separation of prediction and decision making, the main advantage of probabilistic models is their…
While the accuracy of modern deep learning models has significantly improved in recent years, the ability of these models to generate uncertainty estimates has not progressed to the same degree. Uncertainty methods are designed to provide…
We ask: Can focusing on likely classes of a single, in-domain sample improve model predictions? Prior work argued ``no''. We put forward a novel rationale in favor of ``yes'': Sharedness of features among classes indicates their reliability…
We exploit a suitable moment-based characterization of the mixture of Poisson distribution for developing Bayesian inference for the unknown size of a finite population whose units are subject to multiple occurrences during an enumeration…
The missing mass refers to the proportion of data points in an unknown population of classifier inputs that belong to classes not present in the classifier's training data, which is assumed to be a random sample from that unknown…
The number of species can be estimated by sampling individuals from a species assemblage. The problem of estimating generalized species accumulation curve is addressed in a nonparametric Poisson mixture model. A likelihood-based estimator…
Class imbalance poses a significant challenge in classification tasks, where traditional approaches often lead to biased models and unreliable predictions. Undersampling and oversampling techniques have been commonly employed to address…
The availability of high-throughput parallel methods for sequencing microbial communities is increasing our knowledge of the microbial world at an unprecedented rate. Though most attention has focused on determining lower-bounds on the…
We develop a theory of estimation when in addition to a sample of $n$ observed outcomes the underlying probabilities of the observed outcomes are known, as is typically the case in the context of numerical simulation modeling, e.g. in…
In this paper we develop a very general class of bivariate discrete distributions. The basic idea is very simple. The marginals are obtained by taking the random geometric sum of a baseline distribution function. The proposed class of…