Related papers: Finding large average submatrices in high dimensio…

A Nonparametric Bayesian Method for Clustering of High-Dimensional Mixed Dataset

The paper is motivated from clustering problem in high-throughput mixed datasets. Clustering of such datasets can provide much insight into biological associations. An open problem in this context is to simultaneously cluster…

Methodology · Statistics 2018-08-15 Chetkar Jha

Random lasso

We propose a computationally intensive method, the random lasso method, for variable selection in linear models. The method consists of two major steps. In step 1, the lasso method is applied to many bootstrap samples, each using a set of…

Applications · Statistics 2011-04-19 Sijian Wang , Bin Nan , Saharon Rosset , Ji Zhu

Contributions to Biclustering of Microarray Data Using Formal Concept Analysis

Biclustering is an unsupervised data mining technique that aims to unveil patterns (biclusters) from gene expression data matrices. In the framework of this thesis, we propose new biclustering algorithms for microarray data. The latter is…

Machine Learning · Computer Science 2018-11-26 Amina Houari

Sparse group factor analysis for biclustering of multiple data sources

Motivation: Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments.…

Machine Learning · Computer Science 2016-09-15 Kerstin Bunte , Eemeli Leppäaho , Inka Saarinen , Samuel Kaski

Boolean Reasoning-Based Biclustering for Shifting Pattern Extraction

Biclustering is a powerful approach to search for patterns in data, as it can be driven by a function that measures the quality of diverse types of patterns of interest. However, due to its computational complexity, the exploration of the…

Machine Learning · Computer Science 2021-04-27 Marcin Michalak , Jesús S. Aguilar-Ruiz

A multivariate adaptive stochastic search method for dimensionality reduction in classification

High-dimensional classification has become an increasingly important problem. In this paper we propose a "Multivariate Adaptive Stochastic Search" (MASS) approach which first reduces the dimension of the data space and then applies a…

Applications · Statistics 2010-10-08 Tian Siva Tian , Gareth M. James , Rand R. Wilcox

Biclustering Algorithms Based on Metaheuristics: A Review

Biclustering is an unsupervised machine learning technique that simultaneously clusters rows and columns in a data matrix. Biclustering has emerged as an important approach and plays an essential role in various applications such as…

Machine Learning · Computer Science 2022-03-31 Adan Jose-Garcia , Julie Jacques , Vincent Sobanski , Clarisse Dhaenens

Profile Likelihood Biclustering

Biclustering, the process of simultaneously clustering the rows and columns of a data matrix, is a popular and effective tool for finding structure in a high-dimensional dataset. Many biclustering procedures appear to work well in practice,…

Methodology · Statistics 2020-06-04 Cheryl J. Flynn , Patrick O. Perry

A variable selection approach for highly correlated predictors in high-dimensional genomic data

In genomic studies, identifying biomarkers associated with a variable of interest is a major concern in biomedical research. Regularized approaches are classically used to perform variable selection in high-dimensional linear models.…

Methodology · Statistics 2020-07-22 Wencan Zhu , Céline Lévy-Leduc , Nils Ternès

A Goodness-of-fit Test on the Number of Biclusters in a Relational Data Matrix

Biclustering is a method for detecting homogeneous submatrices in a given observed matrix, and it is an effective tool for relational data analysis. Although there are many studies that estimate the underlying bicluster structure of a…

Methodology · Statistics 2021-07-16 Chihiro Watanabe , Taiji Suzuki

HBIC: A Biclustering Algorithm for Heterogeneous Datasets

Biclustering is an unsupervised machine-learning approach aiming to cluster rows and columns simultaneously in a data matrix. Several biclustering algorithms have been proposed for handling numeric datasets. However, real-world data mining…

Machine Learning · Computer Science 2024-08-26 Adán José-García , Julie Jacques , Clément Chauvet , Vincent Sobanski , Clarisse Dhaenens

A Co-analysis Framework for Exploring Multivariate Scientific Data

In complex multivariate data sets, different features usually include diverse associations with different variables, and different variables are associated within different regions. Therefore, exploring the associations between variables…

Computer Vision and Pattern Recognition · Computer Science 2019-08-20 Xiangyang He , Yubo Tao , Qirui Wang , Hai Lin

Efficient subsampling for high-dimensional data

In the field of big data analytics, the search for efficient subdata selection methods that enable robust statistical inferences with minimal computational resources is of high importance. A procedure prior to subdata selection could…

Methodology · Statistics 2024-11-12 Vasilis Chasiotis , Lin Wang , Dimitris Karlis

Towards a Unified Taxonomy of Biclustering Methods

Being an unsupervised machine learning and data mining technique, biclustering and its multimodal extensions are becoming popular tools for analysing object-attribute data in different domains. Apart from conventional clustering techniques,…

Artificial Intelligence · Computer Science 2017-02-20 Dmitry I. Ignatov , Bruce W. Watson

A New Covariate Selection Strategy for High Dimensional Data in Causal Effect Estimation with Multivariate Treatments

Selection of covariates is crucial in the estimation of average treatment effects given observational data with high or even ultra-high dimensional pretreatment variables. Existing methods for this problem typically assume sparse linear…

Methodology · Statistics 2023-03-20 Juan Chen , Yingchun Zhou

Differential gene co-expression networks via Bayesian biclustering models

Identifying latent structure in large data matrices is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes…

Methodology · Statistics 2014-11-10 Chuan Gao , Shiwen Zhao , Ian C. McDowell , Christopher D. Brown , Barbara E. Engelhardt

Supervised Integrative Biclustering with applications to Alzheimer's Disease

Multiple types or views of data (e.g. genetics, proteomics) measured on the same set of individuals are now popularly generated in many biomedical studies. A particular interest might be the detection of sample subgroups (e.g. subtypes of…

Methodology · Statistics 2025-05-09 Kaifeng Yang , Thierry Chekouo , Sandra E. Safo

A Family of Mixture Models for Biclustering

Biclustering is used for simultaneous clustering of the observations and variables when there is no group structure known \textit{a priori}. It is being increasingly used in bioinformatics, text analytics, etc. Previously, biclustering has…

Methodology · Statistics 2020-09-14 Wangshu Tu , Sanjeena Subedi

Finding Important Genes from High-Dimensional Data: An Appraisal of Statistical Tests and Machine-Learning Approaches

Over the past decades, statisticians and machine-learning researchers have developed literally thousands of new tools for the reduction of high-dimensional data in order to identify the variables most responsible for a particular trait.…

Machine Learning · Statistics 2012-05-31 Chamont Wang , Jana Gevertz , Chaur-Chin Chen , Leonardo Auslender

BiSSLB: Binary Spike-and-Slab Lasso Biclustering

Biclustering is a powerful unsupervised learning technique for simultaneously identifying coherent subsets of rows and columns in a data matrix, thus revealing local patterns that may not be apparent in global analyses. However, most…

Methodology · Statistics 2026-03-20 Sijian Fan , Ray Bai