Related papers: Deconstructing Type III
Type III methods were introduced by SAS to address difficulties in dummy-variable models for effects of multiple factors and covariates. They are widely used in practice; they are the default method in several statistical computing…
Type III methods, introduced by SAS in 1976, formulate estimable functions that substitute, somehow, for classical ANOVA effects in multiple linear regression models. They have been controversial since, provoking wide use and satisfied…
In 1934, F. Yates described a sum of squares for testing factor main effects in saturated unbalanced models for effects of two factors. He claimed no particular properties of this sum of squares other than that it provided an "efficient…
It is shown that the sum of squares by Yates's method of weighted squares of means is equivalent to numerator sums of squares formulated by other methods. These relations are established first for hypotheses about fixed effects in a general…
This paper establishes three properties of F-statistics for inference about the mean vector in multiple regression and analysis of variance. The extra SSE due to imposing a set of linear conditions on the model tests the estimable part of…
Estimation is the computational task of recovering a hidden parameter $x$ associated with a distribution $D_x$, given a measurement $y$ sampled from the distribution. High dimensional estimation problems arise naturally in statistics,…
Modeling human behavioral data is challenging due to its scale, sparseness (few observations per individual), heterogeneity (differently behaving individuals), and class imbalance (few observations of the outcome of interest). An additional…
SHAP is a popular method for measuring variable importance in machine learning models. In this paper, we study the algorithm used to estimate SHAP scores and outline its connection to the functional ANOVA decomposition. We use this…
Support Vector Machines (SVMs) are an important tool for performing classification on scattered data, where one usually has to deal with many data points in high-dimensional spaces. We propose solving SVMs in primal form using feature maps…
Distorted sums of models were introduced and discussed in [Sh:463]. This notion generalizes the notion of disjoint (or direct) sums of models by letting the summands overlap. In the first section we investigate types in distorted sums and…
In the social sciences we are often interested in comparing models specified by parametric equality or inequality constraints. For instance, when examining three group means $\{ \mu_1, \mu_2, \mu_3\}$ through an analysis of variance…
The article presents mathematical generalization of results which originated as solutions of practical problems, in particular, the modeling of transitional processes in electrical circuits and problems of resource allocation. However, the…
We present a comprehensive framework for applying rigorous statistical techniques from econometrics to analyze and improve machine learning systems. We introduce key statistical methods such as Ordinary Least Squares (OLS) regression,…
Simulation methods are among the most ubiquitous methodological tools in statistical science. In particular, statisticians often is simulation to explore properties of statistical functionals in models for which developed statistical theory…
Sign-Perturbed Sum (SPS) is a powerful finite-sample system identification algorithm which can construct confidence regions for the true data generating system with exact coverage probabilities, for any finite sample size. SPS was developed…
Factorial designs are frequently used in different fields of science, e.g. psychological, medical or biometric studies. Standard approaches, as the ANOVA $F$-test, make different assumptions on the distribution of the error terms, the…
Assessing variability according to distinct factors in data is a fundamental technique of statistics. The method commonly regarded to as analysis of variance (ANOVA) is, however, typically confined to the case where all levels of a factor…
I present a critique of the methods used in a typical paper. This leads to three broad conclusions about the conventional use of statistical methods. First, results are often reported in an unnecessarily obscure manner. Second, the null…
Inspired by the analysis of variance (ANOVA) decomposition of functions we propose a Gaussian-Uniform mixture model on the high-dimensional torus which relies on the assumption that the function we wish to approximate can be well explained…
Effectus theory is a relatively new approach to categorical logic that can be seen as an abstract form of generalized probabilistic theories (GPTs). While the scalars of a GPT are always the real unit interval [0,1], in an effectus they can…