Skip to main content

Python bindings for CityHash and FarmHash

Project description

A Python wrapper around FarmHash and CityHash

Latest Version Downloads Tests Status Supported Python versions License

Getting Started

To use this package in your program, simply type

pip install cityhash

After that, you should be able to import the module and do things with it (see usage example below).

Usage Examples

Stateless hashing

The package contains 64- and 128-bit implementations of the CityHash algorithm, named as follows:

>>> from cityhash import CityHash32, CityHash64, CityHash128
>>> print(CityHash32("abc"))
795041479
>>> print(CityHash64("abc"))
2640714258260161385
>>> print(CityHash128("abc"))
76434233956484675513733017140465933893

Hardware-independent fingerprints

Fingerprints are seedless hashes which are guaranteed to be hardware- and platform- independent.

>>> from farmhash import Fingerprint128
>>> print(Fingerprint128("abc"))
76434233956484675513733017140465933893

Incremental hashing

This implementation of CityHash and FarmHash does not support incremental hashing. If you require this feature, use MetroHash instead, which does support it.

Fast hashing of NumPy arrays

The methods in this module support Python Buffer Protocol, which allows them to be used on any object that exports a buffer interface. Here is an example showing hashing of a 4D NumPy array:

>>> import numpy as np
>>> from farmhash import FarmHash64
>>> arr = np.zeros((256, 256, 4))
>>> FarmHash64(arr)
1550282412043536862

Note that arrays need to be contiguous for this to work. To convert a non-contiguous array, use np.ascontiguousarray() method.

SSE4.2 optimizations

On CPUs that support SSE4.2 instruction set, optimized FarmHash has significant advantage over non-optimized version and over CityHash, as can be seen below. The numbers below were recoreded on a 2.4 GHz Intel Xeon CPU (E5-2620), and the task was to hash a 512x512x3 NumPy array.

Method

Time (64-bit)

Time (128-bit)

FarmHash / SSE4.2

373 µs ± 48.3 µs

494 µs ± 30.2 µs

FarmHash

494 µs ± 13.8 µs

490 µs ± 23.0 µs

CityHash

497 µs ± 15.0 µs

493 µs ± 21.4 µs

Currently, the setup.py script automatically detects whether the CPU supports SSE4.2 instruction set and enables it during the compilation phase if it does.

Development

For those who want to contribute, here is a quick start using some makefile commands:

git clone https://github.com/escherba/python-cityhash.git
cd python-cityhash
make env           # create a Python virtualenv
make test          # run Python tests
make cpp-test      # run C++ tests

The Makefiles provided have self-documenting targets. To find out which targets are available, type:

make help

See Also

For other fast non-cryptographic hashing implementations available as Python extensions, see MetroHash and MurmurHash.

Authors

The original Python bindings were written by Alexander [Amper] Marshalov, then were largely rewritten for more flexibility by Eugene Scherba. The CityHash and FarmHash algorithms and their C++ implementation are by Google.

License

This software is licensed under the MIT License. See the included LICENSE file for details.

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cityhash-0.3.0.post1.tar.gz (189.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page