Related papers: Discussion of: Treelets--An adaptive multi-scale b…
We would like to congratulate Lee, Nadler and Wasserman on their contribution to clustering and data reduction methods for high $p$ and low $n$ situations. A composite of clustering and traditional principal components analysis, treelets is…
This is a discussion of paper "Treelets--An adaptive multi-scale basis for sparse unordered data" [arXiv:0707.0481] by Ann B. Lee, Boaz Nadler and Larry Wasserman. In this paper the authors defined a new type of dimension reduction…
In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered--with no particular meaning to the given order of the variables. Yet, successful learning is often…
Discussion of "Treelets--An adaptive multi-scale basis for sparse unordered data" [arXiv:0707.0481]
Discussion of "Treelets--An adaptive multi-scale basis for sparse unordered data" [arXiv:0707.0481]
Discussion of "Treelets--An adaptive multi-Scale basis for sparse unordered data" [arXiv:0707.0481]
In data analysis, latent variables play a central role because they help provide powerful insights into a wide variety of phenomena, ranging from biological to human sciences. The latent tree model, a particular type of probabilistic…
Rejoinder of "Treelets--An adaptive multi-scale basis for spare unordered data" [arXiv:0707.0481]
We consider the problem of sparse variable selection on high dimension heterogeneous data sets, which has been taking on renewed interest recently due to the growth of biological and medical data sets with complex, non-i.i.d. structures and…
Tree-structured models are a powerful alternative to parametric regression models if non-linear effects and interactions are present in the data. Yet, classical tree-structured models might not be appropriate if data comes in clusters of…
We present convincing empirical evidence for an effective and general strategy for building accurate small models. Such models are attractive for interpretability and also find use in resource-constrained environments. The strategy is to…
Tree ensembles such as random forests and boosted trees are accurate but difficult to understand, debug and deploy. In this work, we provide the inTrees (interpretable trees) framework that extracts, measures, prunes and selects rules from…
Clustering serves as a vital tool for uncovering latent data structures, and achieving both high accuracy and interpretability is essential. To this end, existing methods typically construct binary decision trees by solving mixed-integer…
Tree ensembles, such as random forests and boosted trees, are renowned for their high prediction performance. However, their interpretability is critically limited due to the enormous complexity. In this study, we present a method to make a…
Wavelet trees are widely used in the representation of sequences, permutations, text collections, binary relations, discrete points, and other succinct data structures. We show, however, that this still falls short of exploiting all of the…
Tree ensembles, such as random forest and boosted trees, are renowned for their high prediction performance, whereas their interpretability is critically limited. In this paper, we propose a post processing method that improves the model…
Constrained clustering is a semi-supervised task that employs a limited amount of labelled data, formulated as constraints, to incorporate domain-specific knowledge and to significantly improve clustering accuracy. Previous work has…
Large tree structures are ubiquitous and real-world relational datasets often have information associated with nodes (e.g., labels or other attributes) and edges (e.g., weights or distances) that need to be communicated to the viewers. Yet,…
Ensembles of decision trees are a useful tool for obtaining for obtaining flexible estimates of regression functions. Examples of these methods include gradient boosted decision trees, random forests, and Bayesian CART. Two potential…
Model interpretability has become an important problem in machine learning (ML) due to the increased effect that algorithmic decisions have on humans. Counterfactual explanations can help users understand not only why ML models make certain…