Related papers: Sparse matrix linear models for structured high-th…
Sparse covariance matrices play crucial roles by encoding the interdependencies between variables in numerous fields such as genetics and neuroscience. Despite substantial studies on sparse covariance matrices, existing methods face several…
Finite Gaussian mixture models are widely used for model-based clustering of continuous data. Nevertheless, since the number of model parameters scales quadratically with the number of variables, these models can be easily…
We propose new methods for multivariate linear regression when the regression coefficient matrix is sparse and the error covariance matrix is dense. We assume that the error covariance matrix has equicorrelation across the response…
A new sparse semiparametric model is proposed, which incorporates the influence of two functional random variables in a scalar response in a flexible and interpretable manner. One of the functional covariates is included through a…
Modern technologies are producing a wealth of data with complex structures. For instance, in two-dimensional digital imaging, flow cytometry, and electroencephalography, matrix type covariates frequently arise when measurements are obtained…
High-dimensional time series datasets are becoming increasingly common in many areas of biological and social sciences. Some important applications include gene regulatory network reconstruction using time course gene expression data, brain…
We consider high-dimensional generalized linear models when the covariates are contaminated by measurement error. Estimates from errors-in-variables regression models are well-known to be biased in traditional low-dimensional settings if…
Recent work has focused on the problem of conducting linear regression when the number of covariates is very large, potentially greater than the sample size. To facilitate this, one useful tool is to assume that the model can be well…
Relationship inference from sparse data is an important task with applications ranging from product recommendation to drug discovery. A recently proposed linear model for sparse matrix completion has demonstrated surprising advantage in…
This paper proposes a fast and accurate method for sparse regression in the presence of missing data. The underlying statistical model encapsulates the low-dimensional structure of the incomplete data matrix and the sparsity of the…
We present fast classification techniques for sparse generalized linear and additive models. These techniques can handle thousands of features and thousands of observations in minutes, even in the presence of many highly correlated…
High-dimensional time series data exist in numerous areas such as finance, genomics, healthcare, and neuroscience. An unavoidable aspect of all such datasets is missing data, and dealing with this issue has been an important focus in…
Scaling autoregressive large language models (LLMs) has driven unprecedented progress but comes with vast computational costs. In this work, we tackle these costs by leveraging unstructured sparsity within an LLM's feedforward layers, the…
As the size of datasets used in statistical learning continues to grow, distributed training of models has attracted increasing attention. These methods partition the data and exploit parallelism to reduce memory and runtime, but suffer…
Molecular profiling data (e.g., gene expression) has been used for clinical risk prediction and biomarker discovery. However, it is necessary to integrate other prior knowledge like biological pathways or gene interaction networks to…
For data with high-dimensional covariates but small to moderate sample sizes, the analysis of single datasets often generates unsatisfactory results. The integrative analysis of multiple independent datasets provides an effective way of…
Sparse regularized regression methods are now widely used in genome-wide association studies (GWAS) to address the multiple testing burden that limits discovery of potentially important predictors. Linear mixed models (LMMs) have become an…
Matrix factorization exploits the idea that, in complex high-dimensional data, the actual signal typically lies in lower-dimensional structures. These lower dimensional objects provide useful insight, with interpretability favored by sparse…
Regression with sparse inputs is a common theme for large scale models. Optimizing the underlying linear algebra for sparse inputs allows such models to be estimated faster. At the same time, centering the inputs has benefits in improving…
Multivariate regression model is a natural generalization of the classical univari- ate regression model for fitting multiple responses. In this paper, we propose a high- dimensional multivariate conditional regression model for…