Related papers: Forest Learning from Data and its Universal Coding
We extend the Chow-Liu algorithm for general random variables while the previous versions only considered finite cases. In particular, this paper applies the generalization to Suzuki's learning algorithm that generates from data forests…
We provide high probability finite sample complexity guarantees for hidden non-parametric structure learning of tree-shaped graphical models, whose hidden and observable nodes are discrete random variables with either finite or countable…
We show that $n$-variable tree-structured Ising models can be learned computationally-efficiently to within total variation distance $\epsilon$ from an optimal $O(n \ln n/\epsilon^2)$ samples, where $O(\cdot)$ hides an absolute constant…
We consider the problem of learning a tree-structured Ising model from data, such that subsequent predictions computed using the model are accurate. Concretely, we aim to learn a model such that posteriors $P(X_i|X_S)$ for small sets of…
The problem of learning forest-structured discrete graphical models from i.i.d. samples is considered. An algorithm based on pruning of the Chow-Liu tree through adaptive thresholding is proposed. It is shown that this algorithm is both…
We provide finite sample guarantees for the classical Chow-Liu algorithm (IEEE Trans.~Inform.~Theory, 1968) to learn a tree-structured graphical model of a distribution. For a distribution $P$ on $\Sigma^n$ and a tree $T$ on $n$ nodes, we…
We develop optimal algorithms for learning undirected Gaussian trees and directed Gaussian polytrees from data. We consider both problems of distribution learning (i.e. in KL distance) and structure learning (i.e. exact recovery). The first…
We consider the problem of learning underlying tree structure from noisy, mixed data obtained from a linear model. To achieve this, we use the expectation maximization algorithm combined with Chow-Liu minimum spanning tree algorithm. This…
Cyber-physical systems come with increasingly complex architectures and failure modes, which complicates the task of obtaining accurate system reliability models. At the same time, with the emergence of the (industrial) Internet-of-Things,…
Consider the problem of learning undirected graphical models on trees from corrupted data. Recently Katiyar et al. showed that it is possible to recover trees from noisy binary data up to a small equivalence class of possible trees. Their…
We present convincing empirical evidence for an effective and general strategy for building accurate small models. Such models are attractive for interpretability and also find use in resource-constrained environments. The strategy is to…
Exploratory data analysis is crucial for developing and understanding classification models from high-dimensional datasets. We explore the utility of a new unsupervised tree ensemble called uncharted forest for visualizing class…
Staged trees are probabilistic graphical models capable of representing any class of non-symmetric independence via a coloring of its vertices. Several structural learning routines have been defined and implemented to learn staged trees…
Missing data imputation is a critical challenge in various domains, such as healthcare and finance, where data completeness is vital for accurate analysis. Large language models (LLMs), trained on vast corpora, have shown strong potential…
We consider the task of learning the maximum-likelihood polytree from data. Our first result is a performance guarantee establishing that the optimal branching (or Chow-Liu tree), which can be computed very easily, constitutes a good…
The Chow-Liu algorithm (IEEE Trans.~Inform.~Theory, 1968) has been a mainstay for the learning of tree-structured graphical models from i.i.d.\ sampled data vectors. Its theoretical properties have been well-studied and are well-understood.…
This paper derives a unifying theorem establishing consistency results for a broad class of tree-based algorithms. It improves current results in two aspects. First of all, it can be applied to algorithms that vary from traditional Random…
We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees…
Analysis of sample survey data often requires adjustments to account for missing data in the outcome variables of principal interest. Standard adjustment methods based on item imputation or on propensity weighting factors rely heavily on…
We propose a novel algorithm for optimizing multivariate linear threshold functions as split functions of decision trees to create improved Random Forest classifiers. Standard tree induction methods resort to sampling and exhaustive search…