Skip to main content

C implementation of MurmurHash3 for Python.

Project description

FastMurmurHash3

中文文档按此

fmmh3 is a Python extension module developed using a mix of C language and Cython. It wraps the C language MurmurHash3 hash function, making it available for use in Python. Compared to the pure Python version of MurmurHash3, fmmh3 is several tens to hundreds of times faster. Compared to another C language implementation, the mmh3 library, fmmh3 is 1-2.5 times faster in processing medium and small texts.

Installation

Using pip

pip install fmmh3

Using Poetry

poetry add fmmh3

Benchmark Tests

We compared the performance of fmmh3, the pure Python version of MurmurHash3, and the mmh3 library bound with ctypes. Here are our test results:

Byte String Length MurmurHash3 (Python) mmh3 fmmh3
1 1x 6.27x 15.62x
10 1x 9.43x 23.08x
512 1x 197x 373x
1000 1x 324x 538x

When the byte string size is greater than 1kb, the Python version of the algorithm exceeds the test time. Therefore, we excluded the Python version of the test in data above 1kb. Here is the speed difference between mmh3 and fmmh3:

Byte String Length mmh3 fmmh3
1 1x 2.51x
10 1x 2.44x
100 1x 2.36x
512 1x 1.90x
1000 1x 1.65x
5000 1x 1.18x
10000 1x 1.09x

As we can see, fmmh3 has a significant performance advantage.

Function Usage

fmmh3 provides three functions to calculate MurmurHash3 hash values: hash32_x86, hash128_x86, and hash128_x64:

from fmmh3 import hash32_x86, hash128_x86, hash128_x64

key = b"hello world"
seed = 0

hash32_value = hash32_x86(key, seed)
hash128_x86_value = hash128_x86(key, seed)
hash128_x64_value = hash128_x64(key, seed)

The function returns a hash value integer. key is the byte string to calculate the hash value, and seed is the hash seed, usually a prime number.

Author

This project was developed by Dream2333.

The MurmurHash algorithm was originally proposed by Austin Appleby.

The C version of the algorithm comes from PeterScott.

The Python version used in the benchmark test comes from wc-duck.

Contribution

If you want to contribute to this project, you can:

  • Report issues or suggest improvements on GitHub.
  • Submit pull requests to fix issues or add new features.
  • Share this project to let more people know about it.

License

This project is licensed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fmmh3-0.0.1.tar.gz (5.6 kB view details)

Uploaded Source

File details

Details for the file fmmh3-0.0.1.tar.gz.

File metadata

  • Download URL: fmmh3-0.0.1.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.2 Linux/6.2.0-23-generic

File hashes

Hashes for fmmh3-0.0.1.tar.gz
Algorithm Hash digest
SHA256 29d5ec8c709746dcdd053fcecb876c90416cce03dddd92d6e226b48781a1cd02
MD5 f6d06069e1c99c31ec8ed5b4e5e61d6e
BLAKE2b-256 8969aabf405611f626c083b9586be374e4e8c4c80b1037461e699e207c893502

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page