Related papers: Bucketing Coding and Information Theory for the St…
Weighted Hamming distance, as a similarity measure between binary codes and binary queries, provides superior accuracy in search tasks than Hamming distance. However, how to efficiently and accurately find $K$ binary codes that have the…
This paper describes a class of probabilistic approximation algorithms based on bucket elimination which offer adjustable levels of accuracy and efficiency. We analyze the approximation for several tasks: finding the most probable…
An important problem in quantum information theory is that of bounding sets of correlations that arise from making local measurements on entangled states of arbitrary dimension. Currently, the best-known method to tackle this problem is the…
Given a large dataset of binary codes and a binary query point, we address how to efficiently find $K$ codes in the dataset that yield the largest cosine similarities to the query. The straightforward answer to this problem is to compare…
We consider machine learning in a comparison-based setting where we are given a set of points in a metric space, but we have no access to the actual distances between the points. Instead, we can only ask an oracle whether the distance…
Many high dimensional integrals can be reduced to the problem of finding the relative measures of two sets. Often one set will be exponentially larger than the other, making it difficult to compare the sizes. A standard method of dealing…
Sorting is the task of ordering $n$ elements using pairwise comparisons. It is well known that $m=\Theta(n\log n)$ comparisons are both necessary and sufficient when the outcomes of the comparisons are observed with no noise. In this paper,…
We study ordinal approximation algorithms for maximum-weight bipartite matchings. Such algorithms only know the ordinal preferences of the agents/nodes in the graph for their preferred matches, but must compete with fully omniscient…
Evaluating joint probabilities of potential outcomes and observed variables, and their linear combinations, is a fundamental challenge in causal inference. This paper addresses the bounding and identification of these probabilities in…
For many distributed algorithms, neighborhood size is an important parameter. In radio networks, however, obtaining this information can be difficult due to ad hoc deployments and communication that occurs on a collision-prone shared…
Nearest neighbor is a popular class of classification methods with many desirable properties. For a large data set which cannot be loaded into the memory of a single machine due to computation, communication, privacy, or ownership…
Binary vector embeddings enable fast nearest neighbor retrieval in large databases of high-dimensional objects, and play an important role in many practical applications, such as image and video retrieval. We study the problem of learning…
Upper bounds on the maximum number of codewords in a binary code of a given length and minimum Hamming distance are considered. New bounds are derived by a combination of linear programming and counting arguments. Some of these bounds…
Non-overlapping codes are block codes that have arisen in diverse contexts of computer science and biology. Applications typically require finding non-overlapping codes with large cardinalities, but the maximum size of non-overlapping codes…
In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use locality-preserving hash functions (that tend to map…
Consider that the coordinates of $N$ points are randomly generated along the edges of a $d$-dimensional hypercube (random point problem). The probability that an arbitrary point is the $m$th nearest neighbor to its own $n$th nearest…
Probabilistic inference algorithms for finding the most probable explanation, the maximum aposteriori hypothesis, and the maximum expected utility and for updating belief are reformulated as an elimination--type algorithm called bucket…
Building on previous results of Xing, we give new lower bounds on the rate of intersecting codes over large alphabets. The proof is constructive, and uses algebraic geometry, although nothing beyond the basic theory of linear systems on…
We consider the $(1+\epsilon)$-approximate nearest neighbor search problem: given a set $X$ of $n$ points in a $d$-dimensional space, build a data structure that, given any query point $y$, finds a point $x \in X$ whose distance to $y$ is…
When the competing classes in a classification problem are not of comparable size, many popular classifiers exhibit a bias towards larger classes, and the nearest neighbor classifier is no exception. To take care of this problem, we develop…