Related papers: Sequential category aggregation and partitioning a…

Loglinear modelling of huge contingency tables

Contingency tables are a fundamental representation of multivariate categorical data. As the size of the contingency table grows exponentially with the number of variables, even a moderate number of variables, each with a moderate number of…

Methodology · Statistics 2026-03-10 Veronica Vinciotti , Ernst C. Wit

Composite mixture of log-linear models for categorical data

Multivariate categorical data are routinely collected in many application areas. As the number of cells in the table grows exponentially with the number of variables, many or even most cells will contain zero observations. This severe…

Methodology · Statistics 2020-04-06 Emanuele Aliverti , David B. Dunson

A Sequential Model for Multi-Class Classification

Many classification problems require decisions among a large number of competing classes. These tasks, however, are not handled well by general purpose learning methods and are usually addressed in an ad-hoc fashion. We suggest a general…

Artificial Intelligence · Computer Science 2007-05-23 Yair Even-Zohar , Dan Roth

A Constructive Procedure for Modeling Categorical Variables: Log-Linear and Logit Models

Association between categorical variables in contingency tables is analyzed using the information identities based on multivariate multinomial distributions. A scheme of geometric decompositions of the information identities is developed to…

Methodology · Statistics 2018-04-10 Philip E. Cheng , Jiun-Wei Liou , Hung-Wen Kao , Michelle Liou

Logistic regression models for aggregated data

Logistic regression models are a popular and effective method to predict the probability of categorical response data. However inference for these models can become computationally prohibitive for large datasets. Here we adapt ideas from…

Methodology · Statistics 2020-08-25 Tom Whitaker , Boris Beranger , Scott A. Sisson

Subjects classification from high-dimensional and small-sample size datasets using a strategy based on Clustering Variables around Latent Components (CLV) method

High-dimensional complex systems can be studied through multivariate analysis, as Principal Component Analysis, however large samples of observations frequently are needed for it. Here it is examined a method for small samples based on…

Applications · Statistics 2017-06-16 Dimitri Marques Abramov

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

Finding a set of nested partitions of a dataset is useful to uncover relevant structure at different scales, and is often dealt with a data-dependent methodology. In this paper, we introduce a general two-step methodology for model-based…

Computation · Statistics 2021-04-22 Etienne Côme , Nicolas Jouvin , Pierre Latouche , Charles Bouveyron

Hierarchical subspace models for contingency tables

For statistical analysis of multiway contingency tables we propose modeling interaction terms in each maximal compact component of a hierarchical model. By this approach we can search for parsimonious models with smaller degrees of freedom…

Statistics Theory · Mathematics 2011-08-23 Hisayuki Hara , Tomonari Sei , Akimichi Takemura

Sequential importance sampling for multiway tables

We describe an algorithm for the sequential sampling of entries in multiway contingency tables with given constraints. The algorithm can be used for computations in exact conditional inference. To justify the algorithm, a theory relates…

Statistics Theory · Mathematics 2007-06-13 Yuguo Chen , Ian H. Dinwoodie , Seth Sullivant

Latent Tree Models for Hierarchical Topic Detection

We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree…

Computation and Language · Computer Science 2016-12-22 Peixian Chen , Nevin L. Zhang , Tengfei Liu , Leonard K. M. Poon , Zhourong Chen , Farhan Khawar

Pseudo-clustering for combining data sets with multiple hierarchies

Multi-level modeling is an important approach for analyzing complex survey data using multi-stage sampling. However, estimation of multi-level models can be challenging when we combine several datasets with distinct hierarchies with…

Methodology · Statistics 2023-09-26 Seho Park , A James OMalley

A sequential rejection testing method for high-dimensional regression with correlated variables

We propose a general, modular method for significance testing of groups (or clusters) of variables in a high-dimensional linear model. In presence of high correlations among the covariables, due to serious problems of identifiability, it is…

Statistics Theory · Mathematics 2015-02-12 Jacopo Mandozzi , Peter Bühlmann

Graphical Log-linear Models: Fundamental Concepts and Applications

We present a comprehensive study of graphical log-linear models for contingency tables. High dimensional contingency tables arise in many areas such as computational biology, collection of survey and census data and others. Analysis of…

Methodology · Statistics 2016-03-15 Niharika Gauraha

Incorporating LLM Priors into Tabular Learners

We present a method to integrate Large Language Models (LLMs) and traditional tabular data classification techniques, addressing LLMs challenges like data serialization sensitivity and biases. We introduce two strategies utilizing LLMs for…

Machine Learning · Computer Science 2023-11-21 Max Zhu , Siniša Stanivuk , Andrija Petrovic , Mladen Nikolic , Pietro Lio

Clustering Hierarchies via a Semi-Parametric Generalized Linear Mixed Model: a statistical significance-based approach

We introduce a novel statistical significance-based approach for clustering hierarchical data using semi-parametric linear mixed-effects models designed for responses with laws in the exponential family (e.g., Poisson and Bernoulli). Within…

Methodology · Statistics 2025-02-04 Alessandra Ragni , Chiara Masci , Francesca Ieva , Anna Maria Paganoni

Exploring dependence between categorical variables: benefits and limitations of using variable selection within Bayesian clustering in relation to log-linear modelling with interaction terms

This manuscript is concerned with relating two approaches that can be used to explore complex dependence structures between categorical variables, namely Bayesian partitioning of the covariate space incorporating a variable selection…

Methodology · Statistics 2016-01-06 Michail Papathomas , Sylvia Richardson

Restoring Heterogeneity in LLM-based Social Simulation: An Audience Segmentation Approach

Large Language Models (LLMs) are increasingly used to simulate social attitudes and behaviors, offering scalable "silicon samples" that can approximate human data. However, current simulation practice often collapses diversity into an…

Computers and Society · Computer Science 2026-04-09 Xiaoyou Qin , Zhihong Li , Xiaoxiao Cheng

Hierarchical clustering of mixed-type data based on barycentric coding

Clustering of mixed-type datasets can be a particularly challenging task as it requires taking into account the associations between variables with different level of measurement, i.e., nominal, ordinal and/or interval. In some cases,…

Methodology · Statistics 2022-04-22 Odysseas Moschidis , Angelos Markos , Theodore Chadjipadelis

Partitioned conditional generalized linear models for categorical data

In categorical data analysis, several regression models have been proposed for hierarchically-structured response variables, e.g. the nested logit model. But they have been formally defined for only two or three levels in the hierarchy.…

Methodology · Statistics 2014-05-23 Jean Peyhardi , Catherine Trottier , Yann Guédon

Clustering in graphs and hypergraphs with categorical edge labels

Modern graph or network datasets often contain rich structure that goes beyond simple pairwise connections between nodes. This calls for complex representations that can capture, for instance, edges of different types as well as so-called…

Social and Information Networks · Computer Science 2020-02-19 Ilya Amburg , Nate Veldt , Austin R. Benson