Related papers: Discrete Component Analysis

Applying Discrete PCA in Data Analysis

Methods for analysis of principal components in discrete data have existed for some time under various names such as grade of membership modelling, probabilistic latent semantic analysis, and genotype inference with admixture. In this paper…

Machine Learning · Computer Science 2012-07-19 Wray L. Buntine , Aleks Jakulin

Differentially Private Methods for Compositional Data

Confidential data, such as electronic health records, activity data from wearable devices, and geolocation data, are becoming increasingly prevalent. Differential privacy provides a framework to conduct statistical analyses while mitigating…

Methodology · Statistics 2024-08-05 Qi Guo , Andrés F. Barrientos , Víctor Peña

Dirichlet Process Mixtures of Generalized Mallows Models

We present a Dirichlet process mixture model over discrete incomplete rankings and study two Gibbs sampling inference techniques for estimating posterior clusterings. The first approach uses a slice sampling subcomponent for estimating…

Machine Learning · Computer Science 2012-03-19 Marina Meila , Harr Chen

Discrete factor analysis

In this paper, we present a method for factor analysis of discrete data. This is accomplished by fitting a dependent Poisson model with a factor structure. To be able to analyze ordinal data, we also consider a truncated Poisson…

Methodology · Statistics 2019-03-13 Rolf Larsson

Distributed Collapsed Gibbs Sampler for Dirichlet Process Mixture Models in Federated Learning

Dirichlet Process Mixture Models (DPMMs) are widely used to address clustering problems. Their main advantage lies in their ability to automatically estimate the number of clusters during the inference process through the Bayesian…

Machine Learning · Statistics 2023-12-19 Reda Khoufache , Mustapha Lebbah , Hanene Azzag , Etienne Goffinet , Djamel Bouchaffra

A Framework for Private Matrix Analysis

We study private matrix analysis in the sliding window model where only the last $W$ updates to matrices are considered useful for analysis. We give first efficient $o(W)$ space differentially private algorithms for spectral approximation,…

Machine Learning · Computer Science 2020-09-08 Jalaj Upadhyay , Sarvagya Upadhyay

Learning Discrete and Continuous Factors of Data via Alternating Disentanglement

We address the problem of unsupervised disentanglement of discrete and continuous explanatory factors of data. We first show a simple procedure for minimizing the total correlation of the continuous latent variables without having to use a…

Machine Learning · Computer Science 2019-05-24 Yeonwoo Jeong , Hyun Oh Song

Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data

We consider the estimation of Dirichlet Process Mixture Models (DPMMs) in distributed environments, where data are distributed across multiple computing nodes. A key advantage of Bayesian nonparametric models such as DPMMs is that they…

Machine Learning · Statistics 2017-09-20 Ruohui Wang , Dahua Lin

Independent Component Analysis for Compositional Data

Compositional data represent a specific family of multivariate data, where the information of interest is contained in the ratios between parts rather than in absolute values of single parts. The analysis of such specific data is…

Methodology · Statistics 2021-07-07 Christoph Muehlmann , Kamila Fačevicová , Alžběta Gardlo , Hana Janečková , Klaus Nordhausen

On Smoothing and Inference for Topic Models

Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling,…

Machine Learning · Computer Science 2012-05-14 Arthur Asuncion , Max Welling , Padhraic Smyth , Yee Whye Teh

Dependent Dirichlet processes via thinning

When analyzing data from multiple sources, it is often convenient to strike a careful balance between two goals: capturing the heterogeneity of the samples and sharing information across them. We introduce a novel framework to model a…

Methodology · Statistics 2026-03-02 Laura D'Angelo , Bernardo Nipoti , Andrea Ongaro

Split Gibbs Discrete Diffusion Posterior Sampling

We study the problem of posterior sampling in discrete-state spaces using discrete diffusion models. While posterior sampling methods for continuous diffusion models have achieved remarkable progress, analogous methods for discrete…

Machine Learning · Computer Science 2025-11-04 Wenda Chu , Zihui Wu , Yifan Chen , Yang Song , Yisong Yue

Discrete Sampling using Semigradient-based Product Mixtures

We consider the problem of inference in discrete probabilistic models, that is, distributions over subsets of a finite ground set. These encompass a range of well-known models in machine learning, such as determinantal point processes and…

Machine Learning · Computer Science 2018-07-10 Alkis Gotovos , Hamed Hassani , Andreas Krause , Stefanie Jegelka

Exploring Discrete Factor Analysis with the discFA Package in R

Literature suggested that using the traditional factor analysis for the count data may be inappropriate. With that in mind, discrete factor analysis builds on fitting systems of dependent discrete random variables to data. The data should…

Methodology · Statistics 2025-06-17 Reza Arabi Belaghi , Yasin Asar , Rolf Larsson

Principal Subsimplex Analysis

Compositional data, also referred to as simplicial data, naturally arise in many scientific domains such as geochemistry, microbiology, and economics. In such domains, obtaining sensible lower-dimensional representations and modes of…

Methodology · Statistics 2025-04-15 Hyeon Lee , Kassel Liam Hingee , Janice L. Scealy , Andrew T. A. Wood , Eric Grunsky , J. S. Marron

Principal Component Analysis for Experiments

Motivation: Although principal component analysis is frequently applied to reduce the dimensionality of matrix data, the method is sensitive to noise and bias and has difficulty with comparability and interpretation. These issues are…

Methodology · Statistics 2012-12-27 Tomokazu Konishi

Anchored Discrete Factor Analysis

We present a semi-supervised learning algorithm for learning discrete factor analysis models with arbitrary structure on the latent variables. Our algorithm assumes that every latent variable has an "anchor", an observed variable with only…

Machine Learning · Statistics 2015-11-12 Yoni Halpern , Steven Horng , David Sontag

Distributed Principal Component Analysis with Limited Communication

We study efficient distributed algorithms for the fundamental problem of principal component analysis and leading eigenvector computation on the sphere, when the data are randomly distributed among a set of computational nodes. We propose a…

Optimization and Control · Mathematics 2021-10-28 Foivos Alimisis , Peter Davies , Bart Vandereycken , Dan Alistarh

A Gibbs Sampler for Multivariate Linear Regression

Kelly (2007, hereafter K07) described an efficient algorithm, using Gibbs sampling, for performing linear regression in the fairly general case where non-zero measurement errors exist for both the covariates and response variables, where…

Instrumentation and Methods for Astrophysics · Physics 2016-02-17 Adam B. Mantz

A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models

Dirichlet Process Mixture (DPM) models have been increasingly employed to specify random partition models that take into account possible patterns within the covariates. Furthermore, to deal with large numbers of covariates, methods for…

Applications · Statistics 2016-11-01 William Barcella , Maria De Iorio , Gianluca Baio