perfect-hash

creating perfect minimal hash function

These details have not been verified by PyPI

Project links

Homepage

Project description

Generate a minimal perfect hash function for a given set of keys. A given code template is filled with parameters, such that the output is code which implements the hash function. Templates can easily be constructed for any programming language. Although the perfect-hash command generates Python code by default, this code is only meant to be a working illustration of the generated hash function. As Python has a very efficient dictionary implementation, one would ordinarily never want to use this Python code in production.

Installation

The minimal perfect hash function generator is written in pure Python, and can be installed using:

$ pip install perfect-hash

The code supports Python 3.5 or higher.

Introduction

A perfect hash function of a certain set S of keys is a hash function which maps all keys in S to different numbers. That means that for the set S, the hash function is collision-free, or perfect. Further, a perfect hash function is called “minimal” when it maps N keys to N consecutive integers, usually in range(N).

Usage

Given a set of keys which are character strings, the program returns a minimal perfect hash function. This hash function is returned in the form of Python code by default. Suppose we have a file with keys:

# 'animals.txt'
Elephant
Horse
Camel
Python
Dog
Cat

The exact way this file is parsed can be specified using command line options, for example it is possible to only read one column from a file which contains different items in each row. The program is invoked like this:

$ perfect-hash animals.txt
# =======================================================================
# ================= Python code for perfect hash function ===============
# =======================================================================

G = [0, 4, 0, 5, 5, 4, 6]

def hash_f(key, salt):
    return sum(salt[i] * c for i, c in enumerate(key)) % 7

def perfect_hash(key):
    key = key.encode()
    if len(key) > 8:
        return -1
    return (G[hash_f(key, b"W4dBruLw")] +
            G[hash_f(key, b"J5GKXqH1")]) % 7

# ============================ Sanity check =============================

K = ["Elephant", "Horse", "Camel", "Python", "Dog", "Cat"]
assert len(K) == 6

for h, k in enumerate(K):
    assert perfect_hash(k) == h

The way the program works is by filling a code template with the calculated parameters. The program can take such a template in form of a file and fill in the calculated parameters, this allows the generation of perfect hash function in any programming language. The hash function is kept quite simple and does not require machine or language specific byte level operations which might be hard to implement in the target language. The following parameters are available in the template:

string	expands to
$NS	length of S1 and S2 salt
$S1	S1 salt
$S2	S2 salt
$NG	length of array G
$G	array of integers G
$NK	number of keys, i.e. length of array K
$K	array with (quoted) keys K
$$	$ (a literal dollar sign)

Since the syntax for arrays is not the same in all programming languages, some specifics can be adjusted using command line options. The built-in template which creates the above code is:

G = [$G]

def hash_f(key, salt):
    return sum(salt[i] * c for i, c in enumerate(key)) % $NG

def perfect_hash(key):
    key = key.encode()
    if len(key) > $NS:
        return -1
    return (G[hash_f(key, b"$S1")] +
            G[hash_f(key, b"$S2")]) % $NG

Using code templates, makes this program very flexible. The source repository includes several complete examples for C. There are many choices one faces when implementing a static hash table: Do the parameter lists go into a separate header file? Should the API for the table only contain the hash values, but not the objects being mapped? And so on. All these various choices are possible because of the template is simply filled with the parameters, no matter what else is inside the template.

Hash function types

One important option the perfect-hash command provides is --hft which is short of “hash function type”. There are two types to choose from:

A random hash function generation which creates hash function with a random string being used as it’s salt. This is the default. Since the generated random hash function does not include large enough output for a very large number of keys (over 10,000), the perfect hash function generation will fail for such large keys. However, the implementation of this hash function is quite simple and fast.
A random hash function generation which creates hash function with a random integers being used as it’s salt. Using this option will always succeed, but an implementation requires two additional integer arrays (apart from the always present array G).

Examples

The source repository contains many useful examples (in examples/) which illustrate how to use the perfect-hash command, as well as python_hash.py as a library.

License of output

perfect-hash is released under the BSD license. However, that does not cause the output produced by perfect-hash to be under BSD. The reason is that the output contains only small pieces of text that come directly from perfect-hash’s source code – less than 10 lines long if the default template is being used, which serves more for illustration purposes - too small for being significant. Therefore the output is not “work based on perfect-hash”.

The output produced by perfect-hash contains essentially all of the input data. Therefore the output is a “derivative work” of the input (in the sense of U.S. copyright law); and its copyright status depends on the copyright of the input. For most software licenses, the result is that the output is under the same license as the input itself, with the same copyright holder, as the input that was merely passed though perfect-hash.

Acknowledgments

Part of the code is based on an a program A.M. Kuchling wrote: http://www.amk.ca/python/code/perfect-hash

The algorithm this library is based on is described in the paper “Optimal algorithms for minimal perfect hashing”, Z. J. Czech, G. Havas and B.S. Majewski. http://cmph.sourceforge.net/papers/chm92.pdf

I tried to illustrate the algorithm and explain how it works on: http://ilan.schnell-web.net/prog/perfect-hash/algo.html

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.5.1

Sep 5, 2025

0.5.0

Sep 2, 2025

0.4.3

Nov 5, 2023

0.4.2

Apr 7, 2021

0.4.1

Jun 15, 2020

0.4.0

Jun 12, 2020

0.3.1

Jun 11, 2020

0.3.0

Jun 10, 2020

0.2.1

May 31, 2019

0.2.0

May 27, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

perfect_hash-0.5.1.tar.gz (9.7 kB view details)

Uploaded Sep 5, 2025 Source

File details

Details for the file perfect_hash-0.5.1.tar.gz.

File metadata

Download URL: perfect_hash-0.5.1.tar.gz
Upload date: Sep 5, 2025
Size: 9.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for perfect_hash-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`0bd9b4af768b87f8bab826018bee292a800de324c4e2cd8dce390a71753674e2`
MD5	`32deb4eea634c7fd414b0c7ac876d6e1`
BLAKE2b-256	`5ce59d6d17c238467516728b99ac1c23e817df10f7cde8ba0daac275ab5401c4`

See more details on using hashes here.

perfect-hash 0.5.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Introduction

Usage

Hash function types

Examples

License of output

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes