Other Statistics
The evolving focus in statistics and data science education highlights the growing importance of computing. This paper presents the Data Jamboree, a live event that combines computational methods with traditional statistical techniques to…
Jeffreys-Lindley paradox is a case where frequentist and Bayesian hypothesis testing methodologies contradict with each other. This has caused confusion among data analysts for selecting a methodology for their statistical inference tasks.…
I present a simple and transparent standard for career greatness in baseball: any major league player with H > 2500 or HR > 350 or K > 2800 or W > 240 makes my Hall of Fame Cut. Rate statistics are avoided due to small sample issues and to…
The definition of Data Science is a hotly debated topic. For many, the definition is a simple shortcut to Artificial Intelligence or Machine Learning. However, there is far more depth and nuance to the field of Data Science than a simple…
In this paper we present a flexible bivariate distribution specified by a quantile function. The distribution contains as special cases new bivariate exponential, Pareto I, Pareto II, beta, power, log logistic and uniform distributions and…
Organizations worldwide that rely on data-driven approaches regularly employ forecasting methods to enhance their planning and decision-making processes. While extensive research has examined the harms associated with traditional machine…
The term "researcher degrees of freedom" (RDF), which was introduced in metascientific literature in the context of the replication crisis in science, refers to the extent of flexibility a scientist has in making decisions related to data…
Induction is a form of reasoning that starts with a particular example and generalizes to a rule, namely, a hypothesis. However, establishing the truth of a hypothesis is problematic due to the potential occurrence of conflicting events,…
While the traditional conception of inductive logic is Carnapian, I develop a Peircean alternative and use it to unify formal learning theory, statistics, and a significant part of machine learning: supervised learning. Some crucial…
We consider the Youden index fas well as measures evaluating predicted probabilities for the maximum-likelihood estimate of a logistic regression model with predictor the classifier. We give impossibility results showing that the Youden…
Estimating means on Riemannian manifolds is generally computationally expensive because the Riemannian distance function is not known in closed-form for most manifolds. To overcome this, we show that Riemannian diffusion means can be…
This study extends the BG/NBD churn probability model, addressing its limitations in industries where customer behaviour is often influenced by seasonal events and possibly high purchase counts. We propose a modified definition of churn,…
Dimensionality reduction is a fundamental technique in machine learning and data analysis, enabling efficient representation and visualization of high-dimensional data. This paper explores five key methods: Principal Component Analysis…
This paper examines the foundational concept of random variables in probability theory and statistical inference, demonstrating that their mathematical definition requires no reference to randomization or hypothetical repeated sampling. We…
This paper introduces Monotone Delta, an order-theoretic measure designed to enhance the reliability assessment of survey-based instruments in human-machine interactions. Traditional reliability measures, such as Cronbach's Alpha and…
Generation of realistic synthetic data has garnered considerable attention in recent years, particularly in the health research domain due to its utility in, for instance, sharing data while protecting patient privacy or determining optimal…
Open-ended assignments - such as lab reports and semester-long projects - provide data science and statistics students with opportunities for developing communication, critical thinking, and creativity skills. However, providing grades and…
Should one teach coding in a required introductory statistics and data science class for non-major students? Many professors advise against it, considering it a distraction from the important and challenging statistical topics that need to…
This commentary proposes a framework for understanding the role of statistics in policy-making, regulation, and bureaucratic systems. I introduce the concept of "ex ante policy," describing statistical rules and procedures designed before…
Strong artificial intelligence (AI) is envisioned to possess general cognitive abilities and scientific creativity comparable to human intelligence, encompassing both knowledge acquisition and problem-solving. While remarkable progress has…