Minimal Perfect Hashing

This research line investigates the design of minimal perfect hash functions that enable constant-time, collision-free access to large static datasets while compressing storage requirements near the information-theoretic lower bound.

Description

Given a set S of n distinct keys, a function f that bijectively maps the keys of S into the first n natural numbers is called a minimal perfect hash function (MPHF) for S. Algorithms that find such functions when n is large and retain constant evaluation time are of practical interest. For instance, search engines and databases typically use minimal perfect hash functions to quickly assign identifiers to static sets of variable-length keys such as strings.

The challenge is to design an algorithm which is efficient in three different aspects: time to find f (construction time), time to evaluate f on a key of S (lookup time), and space of representation for f.

The main, general-purpose, MPHF developed and maintained by RAVEN is PTHash. PTHash combines very fast construction with good space effectiveness and very fast lookup time. (Interestingly, PTHash inspired subsequent development, such as PHOBIC, PtrHash, and PHast.)

Main Collaborations

  • Peter Sanders’ group (KIT, Germany).

Software

  • PTHash: PTHash is a fast and compact minimal perfect hash function.

  • LPHash: LPHash is a fast and compact locality-preserving minimal perfect hashing for k-mer sets.

Publications