Related papers: Visualizing Count Data Regressions Using Rootogram…
Count data are common in medical research. When these data have more zeros than expected by the most used count distributions, it is common to employ a zero-inflated regression model. However, the interpretability of these models is much…
'Optimal cutpoints' for binary classification tasks are often established by testing which cutpoint yields the best discrimination, for example the Youden index, in a specific sample. This results in 'optimal' cutpoints that are highly…
In this paper, we examine roots of graph polynomials where those roots can be considered as structural graph measures. More precisely, we prove analytical results for the roots of certain modified graph polynomials and also discuss…
A regression method for proportional, or fractional, data with mixed effects is outlined, designed for analysis of datasets in which the outcomes have substantial weight at the bounds. In such cases a normal approximation is particularly…
Count data take on non-negative integer values and are challenging to properly analyze using standard linear-Gaussian methods such as linear regression and principal components analysis. Generalized linear models enable direct modeling of…
Although models for count data with over-dispersion have been widely considered in the literature, models for under-dispersion -- the opposite phenomenon -- have received less attention as it is only relatively common in particular research…
The purpose of this article is to introduce the reader to the ROOT data analysis software package, and demonstrate how it may be used to complement one's accident reconstruction analyses.
This paper proposes a new generalized linear model with the fractional binomial distribution. Zero-inflated Poisson/negative binomial distributions are used for count data with many zeros. To analyze the association of such a count variable…
The root is an important organ of a plant since it is responsible for water and nutrient uptake. Analyzing and modelling variabilities in the geometry and topology of roots can help in assessing the plant's health, understanding its growth…
Count-valued autoregressions are widely used to analyse time-series of reported infectious-disease cases because of their close connection with discrete-time transmission models. However, when such models are applied directly to…
Tensor-on-tensor (TOT) regression is an important tool for the analysis of tensor data, aiming to predict a set of response tensors from a corresponding set of predictor tensors. However, standard TOT regression is sensitive to outliers,…
Poisson regression is a popular tool for modeling count data and is applied in a vast array of applications from the social to the physical sciences and beyond. Real data, however, are often over- or under-dispersed and, thus, not conducive…
Histograms provide a powerful means of summarizing large data sets by representing their distribution in a compact, binned form. The HistogramTools R package enhances R built-in histogram functionality, offering advanced methods for…
Traditional boxplots are widely used for summarizing and visualizing the distribution of numerical data, yet they exhibit significant limitations when applied to skewed or heavy-tailed distributions, often leading to misclassification of…
For the task of relevance analysis, the conventional Tukey's test may be applied to the set of all pairwise comparisons. However, there were few studies that discuss both nonparametric k-sample comparisons and relevance analysis in high…
Due to a wide spectrum of applications in the real world, such as security, financial surveillance, and health risk, various deep anomaly detection models have been proposed and achieved state-of-the-art performance. However, besides being…
The $k$-core decomposition is a widely studied summary statistic that describes a graph's global connectivity structure. In this paper, we move beyond using $k$-core decomposition as a tool to summarize a graph and propose using $k$-core…
Topographs, introduced by Conway in 1997, are infinite trivalent planar trees used to visualize the values of binary quadratic forms. In this work, we study series whose terms are indexed by the vertices of a topograph and show that they…
Regression evaluation has been performed for decades. Some metrics have been identified to be robust against shifting and scaling of the data but considering the different distributions of data is much more difficult to address (imbalance…
Variable trees are a new method for the exploration of discrete multivariate data. They display nested subsets and corresponding frequencies and percentages. Manual calculation of these quantities can be laborious, especially when there are…