Even Better Framework for min-wise Based Algorithms

Guy Feigenblat; Ely Porat; Ariel Shiftan

Even Better Framework for min-wise Based Algorithms

Data Structures and Algorithms 2011-02-18 v1

Authors: Guy Feigenblat , Ely Porat , Ariel Shiftan

Abstract

In a recent paper from SODA11 \cite{kminwise} the authors introduced a general framework for exponential time improvement of \minwise based algorithms by defining and constructing almost \kmin independent family of hash functions. Here we take it a step forward and reduce the space and the independent needed for representing the functions, by defining and constructing a \dkmin independent family of hash functions. Surprisingly, for most cases only 8-wise independent is needed for exponential time and space improvement. Moreover, we bypass the $O(\log{\frac{1}{\epsilon}})$ independent lower bound for approximately \minwise functions \cite{patrascu10kwise-lb}, as we use alternative definition. In addition, as the independent's degree is a small constant it can be implemented efficiently. Informally, under this definition, all subsets of size $d$ of any fixed set $X$ have an equal probability to have hash values among the minimal $k$ values in $X$ , where the probability is over the random choice of hash function from the family. This property measures the randomness of the family, as choosing a truly random function, obviously, satisfies the definition for $d=k=|X|$ . We define and give an efficient time and space construction of approximately \dkmin independent family of hash functions. The degree of independent required is optimal, i.e. only $O(d)$ for $2 \le d < k=O(\frac{d}{\epsilon^2})$ , where $\epsilon \in (0,1)$ is the desired error bound. This construction can be used to improve many \minwise based algorithms, such as \cite{sizeEstimationFramework,Datar02estimatingrarity,NearDuplicate,SimilaritySearch,DBLP:conf/podc/CohenK07}, as will be discussed here. To our knowledge such definitions, for hash functions, were never studied and no construction was given before.

Keywords

hashing approximation algorithm optimization algorithm

Cite

@article{arxiv.1102.3537,
  title  = {Even Better Framework for min-wise Based Algorithms},
  author = {Guy Feigenblat and Ely Porat and Ariel Shiftan},
  journal= {arXiv preprint arXiv:1102.3537},
  year   = {2011}
}

Comments

10 pages + appendix. 15 pages total

Even Better Framework for min-wise Based Algorithms

Abstract

Keywords

Cite

Comments

Related papers