Related papers: Hashing for Sampling-Based Estimation

Locally Uniform Hashing

Hashing is a common technique used in data processing, with a strong impact on the time and resources spent on computation. Hashing also affects the applicability of theoretical results that often assume access to (unrealistic)…

Data Structures and Algorithms · Computer Science 2023-09-29 Ioana O. Bercea , Lorenzo Beretta , Jonas Klausen , Jakob Bæk Tejs Houen , Mikkel Thorup

No Repetition: Fast Streaming with Highly Concentrated Hashing

To get estimators that work within a certain error bound with high probability, a common strategy is to design one that works with constant probability, and then boost the probability using independent repetitions. Important examples of…

Data Structures and Algorithms · Computer Science 2020-04-03 Anders Aamand , Debarati Das , Evangelos Kipouridis , Jakob B. T. Knudsen , Peter M. R. Rasmussen , Mikkel Thorup

Practical Hash Functions for Similarity Estimation and Dimensionality Reduction

Hashing is a basic tool for dimensionality reduction employed in several aspects of machine learning. However, the perfomance analysis is often carried out under the abstract assumption that a truly random unit cost hash function is used,…

Machine Learning · Statistics 2017-11-27 Søren Dahlgaard , Mathias Bæk Tejs Knudsen , Mikkel Thorup

Fast hashing with Strong Concentration Bounds

Previous work on tabulation hashing by Patrascu and Thorup from STOC'11 on simple tabulation and from SODA'13 on twisted tabulation offered Chernoff-style concentration bounds on hash based sums, e.g., the number of balls/keys hashing to a…

Data Structures and Algorithms · Computer Science 2020-08-11 Anders Aamand , Jakob B. T. Knudsen , Mathias B. T. Knudsen , Peter M. R. Rasmussen , Mikkel Thorup

Fast Comparative Analysis of Merge Trees Using Locality Sensitive Hashing

Scalar field comparison is a fundamental task in scientific visualization. In topological data analysis, we compare topological descriptors of scalar fields -- such as persistence diagrams and merge trees -- because they provide succinct…

Computational Geometry · Computer Science 2024-09-18 Weiran Lyu , Raghavendra Sridharamurthy , Jeff M. Phillips , Bei Wang

Range-efficient consistent sampling and locality-sensitive hashing for polygons

Locality-sensitive hashing (LSH) is a fundamental technique for similarity search and similarity estimation in high-dimensional spaces. The basic idea is that similar objects should produce hash collisions with probability significantly…

Computational Geometry · Computer Science 2017-09-25 Joachim Gudmundsson , Rasmus Pagh

Perfect Consistent Hashing

Consistent Hashing functions are widely used for load balancing across a variety of applications. However, the original presentation and typical implementations of Consistent Hashing rely on randomised allocation of hash codes to keys which…

Data Structures and Algorithms · Computer Science 2015-03-19 Matthew Sackman

Maximally Consistent Sampling and the Jaccard Index of Probability Distributions

We introduce simple, efficient algorithms for computing a MinHash of a probability distribution, suitable for both sparse and dense data, with equivalent running times to the state of the art for both cases. The collision probability of…

Data Structures and Algorithms · Computer Science 2019-01-04 Ryan Moulton , Yunjiang Jiang

ProbMinHash -- A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity

The probability Jaccard similarity was recently proposed as a natural generalization of the Jaccard similarity to measure the proximity of sets whose elements are associated with relative frequencies or probabilities. In combination with a…

Data Structures and Algorithms · Computer Science 2020-10-27 Otmar Ertl

Hashing for statistics over k-partitions

In this paper we analyze a hash function for $k$-partitioning a set into bins, obtaining strong concentration bounds for standard algorithms combining statistics from each bin. This generic method was originally introduced by Flajolet and…

Data Structures and Algorithms · Computer Science 2016-02-16 Søren Dahlgaard , Mathias Bæk Tejs Knudsen , Eva Rotenberg , Mikkel Thorup

A Hash-based Co-Clustering Algorithm for Categorical Data

Many real-life data are described by categorical attributes without a pre-classification. A common data mining method used to extract information from this type of data is clustering. This method group together the samples from the data…

Machine Learning · Computer Science 2014-07-30 Fabricio Olivetti de França

Hashing as Tie-Aware Learning to Rank

Hashing, or learning binary embeddings of data, is frequently used in nearest neighbor retrieval. In this paper, we develop learning to rank formulations for hashing, aimed at directly optimizing ranking-based evaluation metrics such as…

Machine Learning · Statistics 2018-10-11 Kun He , Fatih Cakir , Sarah Adel Bargal , Stan Sclaroff

Consistent Subset Sampling

Consistent sampling is a technique for specifying, in small space, a subset $S$ of a potentially large universe $U$ such that the elements in $S$ satisfy a suitably chosen sampling condition. Given a subset $\mathcal{I}\subseteq U$ it…

Data Structures and Algorithms · Computer Science 2014-04-21 Konstantin Kutzkov , Rasmus Pagh

Fusion Hashing: A General Framework for Self-improvement of Hashing

Hashing has been widely used for efficient similarity search based on its query and storage efficiency. To obtain better precision, most studies focus on designing different objective functions with different constraints or penalty terms…

Data Structures and Algorithms · Computer Science 2018-10-02 Xingbo Liu , Xiushan Nie , Yilong Yin

Higher-order accurate two-sample network inference and network hashing

Two-sample hypothesis testing for network comparison presents many significant challenges, including: leveraging repeated network observations and known node registration, but without requiring them to operate; relaxing strong structural…

Methodology · Statistics 2024-02-05 Meijia Shao , Dong Xia , Yuan Zhang , Qiong Wu , Shuo Chen

Supervised Hashing Using Graph Cuts and Boosted Decision Trees

Embedding image features into a binary Hamming space can improve both the speed and accuracy of large-scale query-by-example image retrieval systems. Supervised hashing aims to map the original features to compact binary codes in a manner…

Machine Learning · Computer Science 2016-11-17 Guosheng Lin , Chunhua Shen , Anton van den Hengel

Fast Supervised Hashing with Decision Trees for High-Dimensional Data

Supervised hashing aims to map the original features to compact binary codes that are able to preserve label based similarity in the Hamming space. Non-linear hash functions have demonstrated the advantage over linear ones due to their…

Computer Vision and Pattern Recognition · Computer Science 2016-11-17 Guosheng Lin , Chunhua Shen , Qinfeng Shi , Anton van den Hengel , David Suter

Improved Search in Hamming Space using Deep Multi-Index Hashing

Similarity-preserving hashing is a widely-used method for nearest neighbour search in large-scale image retrieval tasks. There has been considerable research on generating efficient image representation via the deep-network-based hashing…

Computer Vision and Pattern Recognition · Computer Science 2017-10-20 Hanjiang Lai , Yan Pan

Analysis of SparseHash: an efficient embedding of set-similarity via sparse projections

Embeddings provide compact representations of signals in order to perform efficient inference in a wide variety of tasks. In particular, random projections are common tools to construct Euclidean distance-preserving embeddings, while…

Data Structures and Algorithms · Computer Science 2019-09-05 Diego Valsesia , Sophie Marie Fosson , Chiara Ravazzi , Tiziano Bianchi , Enrico Magli

Deep Hashing with Semantic Hash Centers for Image Retrieval

Deep hashing is an effective approach for large-scale image retrieval. Current methods are typically classified by their supervision types: point-wise, pair-wise, and list-wise. Recent point-wise techniques (e.g., CSQ, MDS) have improved…

Computer Vision and Pattern Recognition · Computer Science 2025-07-14 Li Chen , Rui Liu , Yuxiang Zhou , Xudong Ma , Yong Chen , Dell Zhang