Related papers: Sparse Adaptive Dirichlet-Multinomial-like Process…

Generalized Polya Urn for Time-varying Dirichlet Process Mixtures

Dirichlet Process Mixtures (DPMs) are a popular class of statistical models to perform density estimation and clustering. However, when the data available have a distribution evolving over time, such models are inadequate. We introduce here…

Methodology · Statistics 2012-06-26 Francois Caron , Manuel Davy , Arnaud Doucet

Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models

We develop a sequential low-complexity inference procedure for Dirichlet process mixtures of Gaussians for online clustering and parameter estimation when the number of clusters are unknown a-priori. We present an easily computable, closed…

Machine Learning · Statistics 2015-09-15 Theodoros Tsiligkaridis , Keith W. Forsythe

Online Deterministic Annealing for Classification and Clustering

Inherent in virtually every iterative machine learning algorithm is the problem of hyper-parameter tuning, which includes three major design parameters: (a) the complexity of the model, e.g., the number of neurons in a neural network, (b)…

Machine Learning · Computer Science 2025-09-26 Christos Mavridis , John Baras

Simple approximate MAP Inference for Dirichlet processes

The Dirichlet process mixture (DPM) is a ubiquitous, flexible Bayesian nonparametric statistical model. However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as…

Machine Learning · Statistics 2014-11-05 Yordan P. Raykov , Alexis Boukouvalas , Max A. Little

Estimation of Dirichlet distribution parameters with bias-reducing adjusted score functions

The Dirichlet distribution, also known as multivariate beta, is the most used to analyse frequencies or proportions data. Maximum likelihood is widespread for estimation of Dirichlet's parameters. However, for small sample sizes, the…

Methodology · Statistics 2021-03-04 Vincenzo Gioia , Euloge Clovis Kenne Pagui

Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data

We consider the estimation of Dirichlet Process Mixture Models (DPMMs) in distributed environments, where data are distributed across multiple computing nodes. A key advantage of Bayesian nonparametric models such as DPMMs is that they…

Machine Learning · Statistics 2017-09-20 Ruohui Wang , Dahua Lin

Prior selection for the precision parameter of Dirichlet Process Mixtures

Consider a Dirichlet process mixture model (DPM) with random precision parameter $\alpha$, inducing $K_n$ clusters over $n$ observations through its latent random partition. Our goal is to specify the prior distribution…

Methodology · Statistics 2025-06-03 Carlo Vicentini , Ian Hyla Jermyn

Optimal subsampling algorithm for the marginal model with large longitudinal data

Big data is ubiquitous in practices, and it has also led to heavy computation burden. To reduce the calculation cost and ensure the effectiveness of parameter estimators, an optimal subset sampling method is proposed to estimate the…

Methodology · Statistics 2023-11-16 Haohui Han , Liya Fu

P\'olya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler

Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language processing and machine learning. Most approaches to training the model rely on iterative algorithms, which makes it difficult to run LDA on big corpora that…

Machine Learning · Statistics 2020-10-23 Alexander Terenin , Måns Magnusson , Leif Jonsson , David Draper

List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering

We study the problem of list-decodable sparse mean estimation. Specifically, for a parameter $\alpha \in (0, 1/2)$, we are given $m$ points in $\mathbb{R}^n$, $\lfloor \alpha m \rfloor$ of which are i.i.d. samples from a distribution $D$…

Data Structures and Algorithms · Computer Science 2024-07-08 Ilias Diakonikolas , Daniel M. Kane , Sushrut Karmalkar , Ankit Pensia , Thanasis Pittas

Learning-based Support Estimation in Sublinear Time

We consider the problem of estimating the number of distinct elements in a large data set (or, equivalently, the support size of the distribution induced by the data set) from a random sample of its elements. The problem occurs in many…

Machine Learning · Computer Science 2021-06-17 Talya Eden , Piotr Indyk , Shyam Narayanan , Ronitt Rubinfeld , Sandeep Silwal , Tal Wagner

Online Adaptive Image Reconstruction (OnAIR) Using Dictionary Models

Sparsity and low-rank models have been popular for reconstructing images and videos from limited or corrupted measurements. Dictionary or transform learning methods are useful in applications such as denoising, inpainting, and medical image…

Machine Learning · Statistics 2019-07-23 Brian E. Moore , Saiprasad Ravishankar , Raj Rao Nadakuditi , Jeffrey A. Fessler

Redundancy of Exchangeable Estimators

Exchangeable random partition processes are the basis for Bayesian approaches to statistical inference in large alphabet settings. On the other hand, the notion of the pattern of a sequence provides an information-theoretic framework for…

Information Theory · Computer Science 2014-10-22 Narayana P. Santhanam , Anand D. Sarwate , Jae Oh Woo

Batch mode active learning for efficient parameter estimation

For many tasks of data analysis, we may only have the information of the explanatory variable and the evaluation of the response values are quite expensive. While it is impractical or too costly to obtain the responses of all units, a…

Computation · Statistics 2023-04-07 Wei Zheng , Ting Tian , Xueqin Wang

Statistical Inference After Adaptive Sampling for Longitudinal Data

Online reinforcement learning and other adaptive sampling algorithms are increasingly used in digital intervention experiments to optimize treatment delivery for users over time. In this work, we focus on longitudinal user data collected by…

Machine Learning · Computer Science 2023-04-20 Kelly W. Zhang , Lucas Janson , Susan A. Murphy

Fast MLE Computation for the Dirichlet Multinomial

Given a collection of categorical data, we want to find the parameters of a Dirichlet distribution which maximizes the likelihood of that data. Newton's method is typically used for this purpose but current implementations require reading…

Machine Learning · Statistics 2023-05-30 Max Sklar

Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data

Large scale deep learning provides a tremendous opportunity to improve the quality of content recommendation systems by employing both wider and deeper models, but this comes at great infrastructural cost and carbon footprint in modern data…

Machine Learning · Computer Science 2020-10-22 Mao Ye , Dhruv Choudhary , Jiecao Yu , Ellie Wen , Zeliang Chen , Jiyan Yang , Jongsoo Park , Qiang Liu , Arun Kejariwal

Sparse partial least squares for on-line variable selection in multivariate data streams

In this paper we propose a computationally efficient algorithm for on-line variable selection in multivariate regression problems involving high dimensional data streams. The algorithm recursively extracts all the latent factors of a…

Machine Learning · Statistics 2009-02-10 Brian McWilliams , Giovanni Montana

Distributed Diffusion-Based LMS for Node-Specific Adaptive Parameter Estimation

A distributed adaptive algorithm is proposed to solve a node-specific parameter estimation problem where nodes are interested in estimating parameters of local interest, parameters of common interest to a subset of nodes and parameters of…

Computers and Society · Computer Science 2023-07-19 Jorge Plata-Chaves , Nikola Bogdanovic , Kostas Berberidis

Adaptive Pseudo-Marginal Algorithm

The Pseudo-Marginal (PM) algorithm is a popular Markov chain Monte Carlo (MCMC) method used to sample from a target distribution when its density is inaccessible, but can be estimated with a non-negative unbiased estimator. Its performance…

Computation · Statistics 2025-09-30 Sarra Abaoubida , Mylène Bédard , Florian Maire