English
Related papers

Related papers: Even Better Framework for min-wise Based Algorithm…

200 papers

We study explicit constructions of min-wise hash families and their extension to $k$-min-wise hash families. Informally, a min-wise hash family guarantees that for any fixed subset $X\subseteq[N]$, every element in $X$ has an equal chance…

Data Structures and Algorithms · Computer Science 2025-11-11 Xue Chen , Shengtang Huang , Xin Li

We show that linear probing requires 5-independent hash functions for expected constant-time performance, matching an upper bound of [Pagh et al. STOC'07]. More precisely, we construct a 4-independent hash functions yielding expected…

Data Structures and Algorithms · Computer Science 2014-12-25 Mikkel Thorup

Randomized algorithms and data structures are often analyzed under the assumption of access to a perfect source of randomness. The most fundamental metric used to measure how "random" a hash function or a random number generator is, is its…

Data Structures and Algorithms · Computer Science 2015-02-23 Mathias Bæk Tejs Knudsen , Morten Stöckel

Minwise hashing is a fundamental and one of the most successful hashing algorithm in the literature. Recent advances based on the idea of densification~\cite{Proc:OneHashLSH_ICML14,Proc:Shrivastava_UAI14} have shown that it is possible to…

Data Structures and Algorithms · Computer Science 2017-03-16 Anshumali Shrivastava

In the problem of minimal perfect hashing, we are given a size $k$ subset $\mathcal{A}$ of a universe of keys $[n] = \{1,2, \cdots, n\}$, for which we wish to construct a hash function $h: [n] \to [k]$ such that $h(\cdot)$ maps…

Information Theory · Computer Science 2026-04-14 Ryan Song , Emre Telatar

Weighted minwise hashing (WMH) is one of the fundamental subroutine, required by many celebrated approximation algorithms, commonly adopted in industrial practice for large scale-search and learning. The resource bottleneck of the…

Data Structures and Algorithms · Computer Science 2016-02-29 Anshumali Shrivastava

Weighted minwise hashing is a standard dimensionality reduction technique with applications to similarity search and large-scale kernel machines. We introduce a simple algorithm that takes a weighted set $x \in \mathbb{R}_{\geq 0}^{d}$ and…

Data Structures and Algorithms · Computer Science 2020-05-26 Tobias Christiani

Perfect hash functions can potentially be used to compress data in connection with a variety of data management tasks. Though there has been considerable work on how to construct good perfect hash functions, there is a gap between theory…

Data Structures and Algorithms · Computer Science 2007-05-23 Fabiano C. Botelho , Rasmus Pagh , Nivio Ziviani

Given a set $S$ of $n$ distinct keys, a function $f$ that bijectively maps the keys of $S$ into the range $\{0,\ldots,n-1\}$ is called a minimal perfect hash function for $S$. Algorithms that find such functions when $n$ is large and retain…

Data Structures and Algorithms · Computer Science 2022-02-08 Giulio Ermanno Pibiri , Roberto Trani

A random hash function $h$ is $\varepsilon$-minwise if for any set $S$, $|S|=n$, and element $x\in S$, $\Pr[h(x)=\min h(S)]=(1\pm\varepsilon)/n$. Minwise hash functions with low bias $\varepsilon$ have widespread applications within…

Data Structures and Algorithms · Computer Science 2014-05-02 Søren Dahlgaard , Mikkel Thorup

Minimal perfect hash functions provide space-efficient and collision-free hashing on static sets. Existing algorithms and implementations that build such functions have practical limitations on the number of input elements they can process,…

Data Structures and Algorithms · Computer Science 2018-11-06 Antoine Limasset , Guillaume Rizk , Rayan Chikhi , Pierre Peterlongo

Recent advances in random linear systems on finite fields have paved the way for the construction of constant-time data structures representing static functions and minimal perfect hash functions using less space with respect to existing…

Data Structures and Algorithms · Computer Science 2016-03-24 Marco Genuzio , Giuseppe Ottaviano , Sebastiano Vigna

In this paper, we study several critical issues which must be tackled before one can apply b-bit minwise hashing to the volumes of data often used industrial applications, especially in the context of search. 1. (b-bit) Minwise hashing…

Information Retrieval · Computer Science 2012-05-15 Ping Li , Anshumali Shrivastava , Arnd Christian Konig

Minwise hashing is the standard technique in the context of search and databases for efficiently estimating set (e.g., high-dimensional 0/1 vector) similarities. Recently, b-bit minwise hashing was proposed which significantly improves upon…

Machine Learning · Statistics 2011-08-04 Ping Li , Christian Konig

A function $f : U \to \{0,\ldots,n-1\}$ is a minimal perfect hash function for a set $S \subseteq U$ of size $n$, if $f$ bijectively maps $S$ into the first $n$ natural numbers. These functions are important for many practical applications…

Data Structures and Algorithms · Computer Science 2023-08-08 Giulio Ermanno Pibiri , Roberto Trani

Given a set $S$ of $n$ keys, a perfect hash function for $S$ maps the keys in $S$ to the first $m \geq n$ integers without collisions. It may return an arbitrary result for any key not in $S$ and is called minimal if $m = n$. The most…

Data Structures and Algorithms · Computer Science 2026-02-06 Hans-Peter Lehmann , Thomas Mueller , Rasmus Pagh , Giulio Ermanno Pibiri , Peter Sanders , Sebastiano Vigna , Stefan Walzer

We consider the following fundamental problems: (1) Constructing $k$-independent hash functions with a space-time tradeoff close to Siegel's lower bound. (2) Constructing representations of unbalanced expander graphs having small size and…

Data Structures and Algorithms · Computer Science 2015-06-12 Tobias Christiani , Rasmus Pagh , Mikkel Thorup

Minwise hashing (MinHash) is a standard algorithm widely used in the industry, for large-scale search and learning applications with the binary (0/1) Jaccard similarity. One common use of MinHash is for processing massive n-gram text…

Machine Learning · Statistics 2023-06-14 Xiaoyun Li , Ping Li

The construction of perfect hash functions is a well-studied topic. In this paper, this concept is generalized with the following definition. We say that a family of functions from $[n]$ to $[k]$ is a $\delta$-balanced $(n,k)$-family of…

Data Structures and Algorithms · Computer Science 2008-12-18 Noga Alon , Shai Gutner

We generalize the monotone local search approach of Fomin, Gaspers, Lokshtanov and Saurabh [J. ACM 2019], by establishing a connection between parameterized approximation and exponential-time approximation algorithms for monotone subset…

Data Structures and Algorithms · Computer Science 2026-01-13 Barış Can Esmer , Ariel Kulik , Dániel Marx , Daniel Neuen , Roohani Sharma
‹ Prev 1 2 3 10 Next ›