Related papers: Nearest Embedded and Embedding Self-Nested Trees
The class of self-nested trees presents remarkable compression properties because of the systematic repetition of subtrees in their structure. In this paper, we provide a better combinatorial characterization of this specific family of…
In this paper we show how to find nearly optimal embeddings of large trees in several natural classes of graphs. The size of the tree T can be as large as a constant fraction of the size of the graph G, and the maximum degree of T can be…
We devise a generalization of tree approximation that generates conforming meshes, i.e., meshes with a particular structure like edge-to-edge triangulations. A key feature of this generalization is that the choices of the cells to be…
We address the problem of efficiently gathering correlated data from a wired or a wireless sensor network, with the aim of designing algorithms with provable optimality guarantees, and understanding how close we can get to the known…
In this paper we describe an algorithm that embeds a graph metric $(V,d_G)$ on an undirected weighted graph $G=(V,E)$ into a distribution of tree metrics $(T,D_T)$ such that for every pair $u,v\in V$, $d_G(u,v)\leq d_T(u,v)$ and…
We propose a principled method for autoencoding with random forests. Our strategy builds on foundational results from nonparametric statistics and spectral graph theory to learn a low-dimensional embedding of the model that optimally…
We present approximation algorithms for the following NP-hard optimization problems related to bottleneck spanning trees in metric spaces. 1. The disjoint bottleneck spanning tree problem: Given $n$ pairs of points in a metric space, find…
Given a reference set $R$ of $n$ points and a query set $Q$ of $m$ points in a metric space, this paper studies an important problem of finding $k$-nearest neighbors of every point $q \in Q$ in the set $R$ in a near-linear time. In the…
Optimal transport provides a metric which quantifies the dissimilarity between probability measures. For measures supported in discrete metric spaces, finding the optimal transport distance has cubic time complexity in the size of the…
We present an empirical analysis of data structures for approximate nearest neighbor searching. We compare the well-known optimized kd-tree splitting method against two alternative splitting methods. The first, called the sliding-midpoint…
We introduce a new compression scheme for labeled trees based on top trees. Our compression scheme is the first to simultaneously take advantage of internal repeats in the tree (as opposed to the classical DAG compression that only exploits…
We present an algorithm for computing a maximum agreement subtree of two unrooted evolutionary trees. It takes O(n^{1.5} log n) time for trees with unbounded degrees, matching the best known time complexity for the rooted case. Our…
We introduce a semiparametric approach to neighbor-based classification. We build off the recently proposed Boundary Trees algorithm by Mathy et al.(2015) which enables fast neighbor-based classification, regression and retrieval in large…
The emergence of massive graph data sets requires fast mining algorithms. Centrality measures to identify important vertices belong to the most popular analysis methods in graph mining. A measure that is gaining attention is forest…
Nearest neighbor (kNN) methods have been gaining popularity in recent years in light of advances in hardware and efficiency of algorithms. There is a plethora of methods to choose from today, each with their own advantages and…
Prediction suffix trees (PST) provide an effective tool for sequence modelling and prediction. Current prediction techniques for PSTs rely on exact matching between the suffix of the current sequence and the previously observed sequence. We…
This paper studies the important problem of finding all $k$-nearest neighbors to points of a query set $Q$ in another reference set $R$ within any metric space. Our previous work defined compressed cover trees and corrected the key…
Polytrees are a subclass of Bayesian networks that seek to capture the conditional dependencies between a set of $n$ variables as a directed forest and are motivated by their more efficient inference and improved interpretability. Since the…
We propose an extension of tree-based space-partitioning indexing structures for data with low intrinsic dimensionality embedded in a high dimensional space. We call this extension an Angle Tree. Our extension can be applied to both…
The Subtree Isomorphism problem asks whether a given tree is contained in another given tree. The problem is of fundamental importance and has been studied since the 1960s. For some variants, e.g., ordered trees, near-linear time algorithms…