Related papers: Even Better Framework for min-wise Based Algorithm…
We study explicit constructions of min-wise hash families and their extension to $k$-min-wise hash families. Informally, a min-wise hash family guarantees that for any fixed subset $X\subseteq[N]$, every element in $X$ has an equal chance…
We show that linear probing requires 5-independent hash functions for expected constant-time performance, matching an upper bound of [Pagh et al. STOC'07]. More precisely, we construct a 4-independent hash functions yielding expected…
Randomized algorithms and data structures are often analyzed under the assumption of access to a perfect source of randomness. The most fundamental metric used to measure how "random" a hash function or a random number generator is, is its…
Minwise hashing is a fundamental and one of the most successful hashing algorithm in the literature. Recent advances based on the idea of densification~\cite{Proc:OneHashLSH_ICML14,Proc:Shrivastava_UAI14} have shown that it is possible to…
In the problem of minimal perfect hashing, we are given a size $k$ subset $\mathcal{A}$ of a universe of keys $[n] = \{1,2, \cdots, n\}$, for which we wish to construct a hash function $h: [n] \to [k]$ such that $h(\cdot)$ maps…
Weighted minwise hashing (WMH) is one of the fundamental subroutine, required by many celebrated approximation algorithms, commonly adopted in industrial practice for large scale-search and learning. The resource bottleneck of the…
Weighted minwise hashing is a standard dimensionality reduction technique with applications to similarity search and large-scale kernel machines. We introduce a simple algorithm that takes a weighted set $x \in \mathbb{R}_{\geq 0}^{d}$ and…
Perfect hash functions can potentially be used to compress data in connection with a variety of data management tasks. Though there has been considerable work on how to construct good perfect hash functions, there is a gap between theory…
Given a set $S$ of $n$ distinct keys, a function $f$ that bijectively maps the keys of $S$ into the range $\{0,\ldots,n-1\}$ is called a minimal perfect hash function for $S$. Algorithms that find such functions when $n$ is large and retain…
A random hash function $h$ is $\varepsilon$-minwise if for any set $S$, $|S|=n$, and element $x\in S$, $\Pr[h(x)=\min h(S)]=(1\pm\varepsilon)/n$. Minwise hash functions with low bias $\varepsilon$ have widespread applications within…
Minimal perfect hash functions provide space-efficient and collision-free hashing on static sets. Existing algorithms and implementations that build such functions have practical limitations on the number of input elements they can process,…
Recent advances in random linear systems on finite fields have paved the way for the construction of constant-time data structures representing static functions and minimal perfect hash functions using less space with respect to existing…
In this paper, we study several critical issues which must be tackled before one can apply b-bit minwise hashing to the volumes of data often used industrial applications, especially in the context of search. 1. (b-bit) Minwise hashing…
Minwise hashing is the standard technique in the context of search and databases for efficiently estimating set (e.g., high-dimensional 0/1 vector) similarities. Recently, b-bit minwise hashing was proposed which significantly improves upon…
A function $f : U \to \{0,\ldots,n-1\}$ is a minimal perfect hash function for a set $S \subseteq U$ of size $n$, if $f$ bijectively maps $S$ into the first $n$ natural numbers. These functions are important for many practical applications…
Given a set $S$ of $n$ keys, a perfect hash function for $S$ maps the keys in $S$ to the first $m \geq n$ integers without collisions. It may return an arbitrary result for any key not in $S$ and is called minimal if $m = n$. The most…
We consider the following fundamental problems: (1) Constructing $k$-independent hash functions with a space-time tradeoff close to Siegel's lower bound. (2) Constructing representations of unbalanced expander graphs having small size and…
Minwise hashing (MinHash) is a standard algorithm widely used in the industry, for large-scale search and learning applications with the binary (0/1) Jaccard similarity. One common use of MinHash is for processing massive n-gram text…
The construction of perfect hash functions is a well-studied topic. In this paper, this concept is generalized with the following definition. We say that a family of functions from $[n]$ to $[k]$ is a $\delta$-balanced $(n,k)$-family of…
We generalize the monotone local search approach of Fomin, Gaspers, Lokshtanov and Saurabh [J. ACM 2019], by establishing a connection between parameterized approximation and exponential-time approximation algorithms for monotone subset…