Related papers: Categorical Data Analysis

Bayesian Estimation and Regularization Techniques in Categorical Data Analysis

This paper explores Bayesian estimation for categorical data, focusing on simple yet effective models that provide a foundation for applying more advanced methods accurately and reliably in real-world applications. We begin by revisiting…

Methodology · Statistics 2025-09-03 Jan Kalina

Random matrix approach to multivariate categorical data analysis

Correlation and similarity measures are widely used in all the areas of sciences and social sciences. Often the variables are not numbers but are instead qualitative descriptors called categorical data. We define and study similarity…

Data Analysis, Statistics and Probability · Physics 2015-10-08 Aashay Patil , M. S. Santhanam

On Understanding Statistical Data Analysis in Higher Education

Data analysis is a powerful tool in all experimental sciences. Statistical methods, such as sampling theory, computer technologies necessary for handling large amounts of data, skill in analysing information contained in different types of…

Physics Education · Physics 2012-06-20 Vera Montalbano

Statistical methods: Basic concepts, interpretations, and cautions

The study of associations and their causal explanations is a central research activity whose methodology varies tremendously across fields. Even within specialized subfields, comparisons across textbooks and journals reveals that the basics…

Methodology · Statistics 2025-10-13 Sander Greenland

Total Empiricism: Learning from Data

Statistical analysis is an important tool to distinguish systematic from chance findings. Current statistical analyses rely on distributional assumptions reflecting the structure of some underlying model, which if not met lead to problems…

Statistics Theory · Mathematics 2023-11-15 Orestis Loukas , Ho Ryun Chung

Data science is science's second chance to get causal inference right: A classification of data science tasks

Causal inference from observational data is the goal of many data analyses in the health and social sciences. However, academic statistics has often frowned upon data analyses with a causal objective. The introduction of the term "data…

Machine Learning · Statistics 2019-04-11 Miguel A. Hernán , John Hsu , Brian Healy

Categorical exploratory data analysis on goodness-of-fit issues

If the aphorism "All models are wrong"- George Box, continues to be true in data analysis, particularly when analyzing real-world data, then we should annotate this wisdom with visible and explainable data-driven patterns. Such annotations…

Machine Learning · Statistics 2020-12-07 Sabrina Enriquez , Fushing Hsieh

Causal Discovery from Temporal Data: An Overview and New Perspectives

Temporal data, representing chronological observations of complex systems, has always been a typical data structure that can be widely generated by many domains, such as industry, medicine and finance. Analyzing this type of data is…

Machine Learning · Computer Science 2023-08-04 Chang Gong , Di Yao , Chuzhe Zhang , Wenbin Li , Jingping Bi

The Analysis of Data from Continuous Probability Distributions

Conventional statistics begins with a model, and assigns a likelihood of obtaining any particular set of data. The opposite approach, beginning with the data and assigning a likelihood to any particular model, is explored here for the case…

Data Analysis, Statistics and Probability · Physics 2009-10-30 Timothy E. Holy

Graph-Based Tests for Two-Sample Comparisons of Categorical Data

We study the problem of two-sample comparison with categorical data when the contingency table is sparsely populated. In modern applications, the number of categories is often comparable to the sample size, causing existing methods to have…

Methodology · Statistics 2014-08-14 Hao Chen , Nancy R. Zhang

A Tutorial on Canonical Correlation Methods

Canonical correlation analysis is a family of multivariate statistical methods for the analysis of paired sets of variables. Since its proposition, canonical correlation analysis has for instance been extended to extract relations between…

Machine Learning · Computer Science 2017-11-08 Viivi Uurtio , João M. Monteiro , Jaz Kandola , John Shawe-Taylor , Delmiro Fernandez-Reyes , Juho Rousu

Ordered Sets for Data Analysis

This book dwells on mathematical and algorithmic issues of data analysis based on generality order of descriptions and respective precision. To speak of these topics correctly, we have to go some way getting acquainted with the important…

Logic in Computer Science · Computer Science 2019-08-30 Sergei O. Kuznetsov

A Generalized Multinomial Distribution from Dependent Categorical Random Variables

Categorical random variables are a common staple in machine learning methods and other applications across disciplines. Many times, correlation within categorical predictors exists, and has been noted to have an effect on various algorithm…

Probability · Mathematics 2017-01-25 Rachel Traylor

Identification and Estimation of Categorical Random Coefficient Models

This paper proposes a linear categorical random coefficient model, in which the random coefficients follow parametric categorical distributions. The distributional parameters are identified based on a linear recurrence structure of moments…

Econometrics · Economics 2023-03-01 Zhan Gao , M. Hashem Pesaran

A Survey on Causal Inference

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing…

Methodology · Statistics 2020-02-10 Liuyi Yao , Zhixuan Chu , Sheng Li , Yaliang Li , Jing Gao , Aidong Zhang

Causal Structure Learning

Graphical models can represent a multivariate distribution in a convenient and accessible form as a graph. Causal models can be viewed as a special class of graphical models that not only represent the distribution of the observed system…

Methodology · Statistics 2017-06-29 Christina Heinze-Deml , Marloes H. Maathuis , Nicolai Meinshausen

Composite mixture of log-linear models for categorical data

Multivariate categorical data are routinely collected in many application areas. As the number of cells in the table grows exponentially with the number of variables, many or even most cells will contain zero observations. This severe…

Methodology · Statistics 2020-04-06 Emanuele Aliverti , David B. Dunson

Categorical data as a stone guest in a data science project for predicting defective water meters

After a one-year long effort of research on the field, we developed a machine learning-based classifier, tailored to predict whether a mechanical water meter would fail with passage of time and intensive use as well. A recurrent deep neural…

Machine Learning · Computer Science 2021-02-08 Giovanni Delnevo , Marco Roccetti , Luca Casini

Applying Discrete PCA in Data Analysis

Methods for analysis of principal components in discrete data have existed for some time under various names such as grade of membership modelling, probabilistic latent semantic analysis, and genotype inference with admixture. In this paper…

Machine Learning · Computer Science 2012-07-19 Wray L. Buntine , Aleks Jakulin

Categorical data clustering: 25 years beyond K-modes

The clustering of categorical data is a common and important task in computer science, offering profound implications across a spectrum of applications. Unlike purely numerical data, categorical data often lack inherent ordering as in…

Machine Learning · Computer Science 2025-01-28 Tai Dinh , Wong Hauchi , Philippe Fournier-Viger , Daniil Lisik , Minh-Quyet Ha , Hieu-Chi Dam , Van-Nam Huynh