Related papers: Distributed multinomial regression
Text data, including speeches, stories, and other document forms, are often connected to sentiment variables that are of interest for research in marketing, economics, and elsewhere. It is also very high dimensional and difficult to…
Count data take on non-negative integer values and are challenging to properly analyze using standard linear-Gaussian methods such as linear regression and principal components analysis. Generalized linear models enable direct modeling of…
Multi-dimensional data frequently occur in many different fields, including risk management, insurance, biology, environmental sciences, and many more. In analyzing multivariate data, it is imperative that the underlying modelling…
This article introduces an iterative distributed computing estimator for the multinomial logistic regression model with large choice sets. Compared to the maximum likelihood estimator, the proposed iterative distributed estimator achieves…
Factor analysis for high-dimensional data is a canonical problem in statistics and has a wide range of applications. However, there is currently no factor model tailored to effectively analyze high-dimensional count responses with…
The Poisson distribution has been widely studied and used for modeling univariate count-valued data. Multivariate generalizations of the Poisson distribution that permit dependencies, however, have been far less popular. Yet, real-world…
Categorical random variables are a common staple in machine learning methods and other applications across disciplines. Many times, correlation within categorical predictors exists, and has been noted to have an effect on various algorithm…
Poisson regression is a popular tool for modeling count data and is applied in a vast array of applications from the social to the physical sciences and beyond. Real data, however, are often over- or under-dispersed and, thus, not conducive…
Most work on supervised learning research has focused on marginal predictions. In decision problems, joint predictive distributions are essential for good performance. Previous work has developed methods for assessing low-order predictive…
Large Language Models (LLMs) are a powerful tool for statistical text analysis, with derived sequences of next-token probability distributions offering a wealth of information. Extracting this signal typically relies on metrics such as…
A flexible semiparametric class of models is introduced that offers an alternative to classical regression models for count data as the Poisson and negative binomial model, as well as to more general models accounting for excess zeros that…
Estimating statistical models within sensor networks requires distributed algorithms, in which both data and computation are distributed across the nodes of the network. We propose a general approach for distributed learning based on…
We are studying the problems of modeling and inference for multivariate count time series data with Poisson marginals. The focus is on linear and log-linear models. For studying the properties of such processes we develop a novel conceptual…
Researchers are often interested in understanding the relationship between a set of covariates and a set of response variables. To achieve this goal, the use of regression analysis, either linear or generalized linear models, is largely…
There are two main approaches to the distributed representation of words: low-dimensional deep learning embeddings and high-dimensional distributional models, in which each dimension corresponds to a context word. In this paper, we combine…
Regression for count data is widely performed by models such as Poisson, negative binomial (NB) and zero-inflated regression. A challenge often faced by practitioners is the selection of the right model to take into account dispersion,…
Capturing complex dependence structures between outcome variables (e.g., study endpoints) is of high relevance in contemporary biomedical data problems and medical research. Distributional copula regression provides a flexible tool to model…
Bayesian multinomial logistic regression provides a principled, interpretable approach to multiclass classification, but posterior sampling becomes increasingly expensive as the model dimension grows. Prior work has studied scalability in…
Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP. By overcoming data sparsity problems, as well as providing information about semantic relatedness which is not…
Nowadays, an abundance of short text is being generated that uses nonstandard writing styles influenced by regional languages. Such informal and code-switched content are under-resourced in terms of labeled datasets and language models even…