Related papers: Data-Driven Tree Transforms and Metrics

Weighted Sum-of-Trees Model for Clustered Data

Clustered data, which arise when observations are nested within groups, are incredibly common in clinical, education, and social science research. Traditionally, a linear mixed model, which includes random effects to account for…

Methodology · Statistics 2026-02-04 Kevin McCoy , Zachary Wooten , Katarzyna Tomczak , Christine B. Peterson

Treelets--An adaptive multi-scale basis for sparse unordered data

In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered--with no particular meaning to the given order of the variables. Yet, successful learning is often…

Methodology · Statistics 2008-07-25 Ann B. Lee , Boaz Nadler , Larry Wasserman

Natural data structure extracted from neighborhood-similarity graphs

'Big' high-dimensional data are commonly analyzed in low-dimensions, after performing a dimensionality-reduction step that inherently distorts the data structure. For the same purpose, clustering methods are also often used. These methods…

Machine Learning · Statistics 2019-02-20 Tom Lorimer , Karlis Kanders , Ruedi Stoop

A Hybrid Mixture Approach for Clustering and Characterizing Cancer Data

Model-based clustering is widely used for identifying and distinguishing types of diseases. However, modern biomedical data coming with high dimensions make it challenging to perform the model estimation in traditional cluster analysis. The…

Methodology · Statistics 2025-07-22 Kazeem Kareem , Fan Dai

A Clustering Approach to Integrative Analysis of Multiomic Cancer Data

Rapid technological advances have allowed for molecular profiling across multiple omics domains from a single sample for clinical decision making in many diseases, especially cancer. As tumor development and progression are dynamic…

Methodology · Statistics 2022-02-11 Dongyan Yan , Subharup Guha

Data-Driven Logistic Regression Ensembles With Applications in Genomics

Advances in data collecting technologies in genomics have significantly increased the need for tools designed to study the genetic basis of many diseases. Effective statistical methods should excel in both prediction accuracy and biomarker…

Methodology · Statistics 2025-11-13 Anthony-Alexander Christidis , Stefan Van Aelst , Ruben Zamar

Bayesian Consensus Clustering

The task of clustering a set of objects based on multiple sources of data arises in several modern applications. We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These…

Machine Learning · Statistics 2015-12-01 Eric F. Lock , David B. Dunson

Spatial clustering of array CGH features in combination with hierarchical multiple testing

We propose a new approach for clustering DNA features using array CGH data from multiple tumor samples. We distinguish data-collapsing: joining contiguous DNA clones or probes with extremely similar data into regions, from clustering:…

Applications · Statistics 2010-12-21 Kyung In Kim , Etienne Roquain , Mark Van De Wiel

A Framework for Implementing Machine Learning on Omics Data

The potential benefits of applying machine learning methods to -omics data are becoming increasingly apparent, especially in clinical settings. However, the unique characteristics of these data are not always well suited to machine learning…

Machine Learning · Computer Science 2018-11-28 Geoffroy Dubourg-Felonneau , Timothy Cannings , Fergal Cotter , Hannah Thompson , Nirmesh Patel , John W Cassidy , Harry W Clifford

Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review

Cancer has relational information residing at varying scales, modalities, and resolutions of the acquired data, such as radiology, pathology, genomics, proteomics, and clinical records. Integrating diverse data types can improve the…

Machine Learning · Computer Science 2024-07-29 Asim Waqas , Aakash Tripathi , Ravi P. Ramachandran , Paul Stewart , Ghulam Rasool

ClusterGraph: a new tool for visualization and compression of multidimensional data

Understanding the global organization of complicated and high dimensional data is of primary interest for many branches of applied sciences. It is typically achieved by applying dimensionality reduction techniques mapping the considered…

Computational Geometry · Computer Science 2024-11-11 Paweł Dłotko , Davide Gurnari , Mathis Hallier , Anna Jurek-Loughrey

Consensus Tree Estimation with False Discovery Rate Control via Partially Ordered Sets

Connected acyclic graphs (trees) are data objects that hierarchically organize categories. Collections of trees arise in a diverse variety of fields, including evolutionary biology, public health, machine learning, social sciences and…

Methodology · Statistics 2025-12-01 Maria Alejandra Valdez Cabrera , Amy D Willis , Armeen Taeb

Finding Important Genes from High-Dimensional Data: An Appraisal of Statistical Tests and Machine-Learning Approaches

Over the past decades, statisticians and machine-learning researchers have developed literally thousands of new tools for the reduction of high-dimensional data in order to identify the variables most responsible for a particular trait.…

Machine Learning · Statistics 2012-05-31 Chamont Wang , Jana Gevertz , Chaur-Chin Chen , Leonardo Auslender

Transformation trees -- documentation of multimodal image registration

Multimodal image registration plays a key role in creating digital patient models by combining data from different imaging techniques into a single coordinate system. This process often involves multiple sequential and interconnected…

Computer Vision and Pattern Recognition · Computer Science 2025-05-23 Agnieszka Anna Tomaka , Dariusz Pojda , Michał Tarnawski , Leszek Luchowski

Towards multiple kernel principal component analysis for integrative analysis of tumor samples

Personalized treatment of patients based on tissue-specific cancer subtypes has strongly increased the efficacy of the chosen therapies. Even though the amount of data measured for cancer patients has increased over the last years, most…

Machine Learning · Statistics 2017-09-18 Nora K. Speicher , Nico Pfeifer

Dive into Decision Trees and Forests: A Theoretical Demonstration

Based on decision trees, many fields have arguably made tremendous progress in recent years. In simple words, decision trees use the strategy of "divide-and-conquer" to divide the complex problem on the dependency between input features and…

Machine Learning · Computer Science 2021-01-22 Jinxiong Zhang

Joint Geometric and Topological Analysis of Hierarchical Datasets

In a world abundant with diverse data arising from complex acquisition techniques, there is a growing need for new data analysis methods. In this paper we focus on high-dimensional data that are organized into several hierarchical datasets.…

Machine Learning · Computer Science 2021-04-06 Lior Aloni , Omer Bobrowski , Ronen Talmon

Clustering multivariate functional data using unsupervised binary trees

We propose a model-based clustering algorithm for a general class of functional data for which the components could be curves or images. The random functional data realizations could be measured with error at discrete, and possibly random,…

Machine Learning · Statistics 2022-03-14 Steven Golovkine , Nicolas Klutchnikoff , Valentin Patilea

Model-Based Hierarchical Clustering

We present an approach to model-based hierarchical clustering by formulating an objective function based on a Bayesian analysis. This model organizes the data into a cluster hierarchy while specifying a complex feature-set partitioning that…

Machine Learning · Computer Science 2013-01-18 Shivakumar Vaithyanathan , Byron E Dom

Tree Thinking in the Genomic Era: Unifying Models Across Cells, Populations, and Species

The ongoing explosion of genome sequence data is transforming how we reconstruct and understand the histories of biological systems. Across biological scales, from individual cells to populations and species, trees-based models provide a…

Populations and Evolution · Quantitative Biology 2025-12-08 Yun Deng , Shing H. Zhan , Yulin Zhang , Chao Zhang , Bingjie Chen