English
Related papers

Related papers: Exploring data subsets with vtree

200 papers

In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered--with no particular meaning to the given order of the variables. Yet, successful learning is often…

Methodology · Statistics 2008-07-25 Ann B. Lee , Boaz Nadler , Larry Wasserman

The increasing complexity of data requires methods and models that can effectively handle intricate structures, as simplifying them would result in loss of information. While several analytical tools have been developed to work with complex…

Methodology · Statistics 2023-06-16 Riccardo Giubilei , Tullia Padellini , Pierpaolo Brutti

Analyzing large, multivariate graphs is an important problem in many domains, yet such graphs are challenging to visualize. In this paper, we introduce a novel, scalable, tree+table multivariate graph visualization technique, which makes…

Human-Computer Interaction · Computer Science 2018-08-03 Carolina Nobre , Marc Streit , Alexander Lex

We propose and study a multi-scale approach to vector quantization. We develop an algorithm, dubbed reconstruction trees, inspired by decision trees. Here the objective is parsimonious reconstruction of unsupervised data, rather than…

Machine Learning · Computer Science 2019-09-05 Enrico Cecini , Ernesto De Vito , Lorenzo Rosasco

Wavelet trees are widely used in the representation of sequences, permutations, text collections, binary relations, discrete points, and other succinct data structures. We show, however, that this still falls short of exploiting all of the…

Data Structures and Algorithms · Computer Science 2010-11-23 Travis Gagie , Gonzalo Navarro , Simon J. Puglisi

This paper presents two approaches to quantifying and visualizing variation in datasets of trees. The first approach localizes subtrees in which significant population differences are found through hypothesis testing and sparse classifiers…

Tree-based ensemble methods, as Random Forests and Gradient Boosted Trees, have been successfully used for regression in many applications and research studies. Furthermore, these methods have been extended in order to deal with uncertainty…

Machine Learning · Computer Science 2018-11-20 Myriam Tami , Marianne Clausel , Emilie Devijver , Adrien Dulac , Eric Gaussier , Stefan Janaqi , Meriam Chebre

Exploratory data analysis is crucial for developing and understanding classification models from high-dimensional datasets. We explore the utility of a new unsupervised tree ensemble called uncharted forest for visualizing class…

Machine Learning · Statistics 2018-07-03 Casey Kneale , Steven D. Brown

We investigate an application in the automatic tuning of computer codes, an area of research that has come to prominence alongside the recent rise of distributed scientific processing and heterogeneity in high-performance computing…

Applications · Statistics 2013-04-17 Robert B. Gramacy , Matt Taddy , Stefan M. Wild

In data analysis, latent variables play a central role because they help provide powerful insights into a wide variety of phenomena, ranging from biological to human sciences. The latent tree model, a particular type of probabilistic…

Machine Learning · Computer Science 2014-02-05 Raphaël Mourad , Christine Sinoquet , Nevin L. Zhang , Tengfei Liu , Philippe Leray

The varying-coefficient model is a strong tool for the modelling of interactions in generalized regression. It is easy to apply if both the variables that are modified as well as the effect modifiers are known. However, in general one has a…

Methodology · Statistics 2017-05-25 Moritz Berger , Gerhard Tutz , Matthias Schmid

This paper proposes FREEtree, a tree-based method for high dimensional longitudinal data with correlated features. Popular machine learning approaches, like Random Forests, commonly used for variable selection do not perform well when there…

Machine Learning · Statistics 2020-06-18 Yuancheng Xu , Athanasse Zafirov , R. Michael Alvarez , Dan Kojis , Min Tan , Christina M. Ramirez

Understanding the response of an output variable to multi-dimensional inputs lies at the heart of many data exploration endeavours. Topology-based methods, in particular Morse theory and persistent homology, provide a useful framework for…

Graphics · Computer Science 2022-08-16 Yarden Livnat , Dan Maljovec , Attila Gyulassy , Dr Baptiste Mouginot , Valerio Pascucci

Latent tree analysis seeks to model the correlations among a set of random variables using a tree of latent variables. It was proposed as an improvement to latent class analysis --- a method widely used in social sciences and medicine to…

Machine Learning · Computer Science 2016-10-04 Nevin L. Zhang , Leonard K. M. Poon

Analysts commonly investigate the data distributions derived from statistical aggregations of data that are represented by charts, such as histograms and binned scatterplots, to visualize and analyze a large-scale dataset. Aggregate queries…

Databases · Computer Science 2019-10-14 Honghui Mei , Wei Chen , Yating Wei , Yuanzhe Hu , Shuyue Zhou , Bingru Lin , Ying Zhao , Jiazhi Xia

Visualization is a powerful paradigm for exploratory data analysis. Visualizing large graphs, however, often results in a meaningless hairball. In this paper, we propose a different approach that helps the user adaptively explore large…

Information Retrieval · Computer Science 2016-07-25 Robert Pienta , Zhiyuan Lin , Minsuk Kahng , Jilles Vreeken , Partha P. Talukdar , James Abello , Ganesh Parameswaran , Duen Horng Chau

Decision trees are flexible prediction models which are constructed to quantify outcome-covariate relationships and characterize relevant population subgroups. However, the standard graphical representation of fitted decision trees…

Applications · Statistics 2021-03-09 Ashwini Venkatasubramaniam , Julian Wolfson

Tree ensembles such as random forests and boosted trees are accurate but difficult to understand, debug and deploy. In this work, we provide the inTrees (interpretable trees) framework that extracts, measures, prunes and selects rules from…

Machine Learning · Computer Science 2014-08-26 Houtao Deng

Regression trees have emerged as a preeminent tool for solving real-world regression problems due to their ability to deal with nonlinearities, interaction effects and sharp discontinuities. In this article, we rather study regression trees…

Machine Learning · Statistics 2025-11-14 Nathan Wycoff

We would like to congratulate Lee, Nadler and Wasserman on their contribution to clustering and data reduction methods for high $p$ and low $n$ situations. A composite of clustering and traditional principal components analysis, treelets is…

Applications · Statistics 2008-07-28 Catherine Tuglus , Mark J. van der Laan
‹ Prev 1 2 3 10 Next ›