Skip to main content

A fast tool to calculate Hamming distances

Project description

A small C++ tool to calculate pairwise distances between gene sequences given in fasta format.

DOI pypi releases python versions

Python interface

To use the Python interface, you should install it from PyPI:

python -m pip install hammingdist

Distances matrix

Then, you can e.g. use it in the following way from Python:

import hammingdist

# To see the different optional arguments available:
help(hammingdist.from_fasta)

# To import all sequences from a fasta file
data = hammingdist.from_fasta("example.fasta")

# To import only the first 100 sequences from a fasta file
data = hammingdist.from_fasta("example.fasta", n=100)

# To import all sequences and remove any duplicates
data = hammingdist.from_fasta("example.fasta", remove_duplicates=True)

# To import all sequences from a fasta file, also treating 'X' as a valid character
data = hammingdist.from_fasta("example.fasta", include_x=True)

# The distance data can be accessed point-wise, though looping over all distances might be quite inefficient
print(data[14,42])

# The data can be written to disk in csv format (default `distance` Ripser format) and retrieved:
data.dump("backup.csv")
retrieval = hammingdist.from_csv("backup.csv")

# It can also be written in lower triangular format (comma-delimited row-major, `lower-distance` Ripser format):
data.dump_lower_triangular("lt.txt")
retrieval = hammingdist.from_lower_triangular("lt.txt")

# If the `remove_duplicates` option was used, the sequence indices can also be written.
# For each input sequence, this prints the corresponding index in the output:
data.dump_sequence_indices("indices.txt")

# Finally, we can pass the data as a list of strings in Python:
data = hammingdist.from_stringlist(["ACGTACGT", "ACGTAGGT", "ATTTACGT"])

Distances from reference sequence

The distance of each sequence in a fasta file from a given reference sequence can be calculated using:

import hammingdist

distances = hammingdist.fasta_reference_distances(sequence, fasta_file, include_x=True)

This function returns a numpy array that contains the distance of each sequence from the reference sequence.

You can also calculate the distance between two individual sequences:

import hammingdist

distance = hammingdist.distance("ACGTX", "AAGTX", include_x=True)

OpenMP on linux

The latest versions of hammingdist on linux are now built with OpenMP (multithreading) support. If this causes any issues, you can install a previous version of hammingdist without OpenMP support:

pip install hammingdist==0.11.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

hammingdist-0.13.0-pp38-pypy38_pp73-win_amd64.whl (186.5 kB view hashes)

Uploaded PyPy Windows x86-64

hammingdist-0.13.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (325.5 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

hammingdist-0.13.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl (182.0 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

hammingdist-0.13.0-pp37-pypy37_pp73-win_amd64.whl (187.4 kB view hashes)

Uploaded PyPy Windows x86-64

hammingdist-0.13.0-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (328.7 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

hammingdist-0.13.0-pp37-pypy37_pp73-macosx_10_9_x86_64.whl (90.4 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

hammingdist-0.13.0-cp310-cp310-win_amd64.whl (95.6 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

hammingdist-0.13.0-cp310-cp310-win32.whl (82.9 kB view hashes)

Uploaded CPython 3.10 Windows x86

hammingdist-0.13.0-cp310-cp310-musllinux_1_1_x86_64.whl (723.3 kB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

hammingdist-0.13.0-cp310-cp310-musllinux_1_1_i686.whl (787.2 kB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

hammingdist-0.13.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (198.4 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

hammingdist-0.13.0-cp310-cp310-macosx_10_9_x86_64.whl (91.0 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

hammingdist-0.13.0-cp39-cp39-win_amd64.whl (94.8 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

hammingdist-0.13.0-cp39-cp39-win32.whl (83.0 kB view hashes)

Uploaded CPython 3.9 Windows x86

hammingdist-0.13.0-cp39-cp39-musllinux_1_1_x86_64.whl (723.5 kB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

hammingdist-0.13.0-cp39-cp39-musllinux_1_1_i686.whl (787.5 kB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

hammingdist-0.13.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (198.5 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

hammingdist-0.13.0-cp39-cp39-macosx_10_9_x86_64.whl (91.1 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

hammingdist-0.13.0-cp38-cp38-win_amd64.whl (95.5 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

hammingdist-0.13.0-cp38-cp38-win32.whl (82.9 kB view hashes)

Uploaded CPython 3.8 Windows x86

hammingdist-0.13.0-cp38-cp38-musllinux_1_1_x86_64.whl (723.3 kB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

hammingdist-0.13.0-cp38-cp38-musllinux_1_1_i686.whl (787.3 kB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

hammingdist-0.13.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (198.3 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

hammingdist-0.13.0-cp38-cp38-macosx_10_9_x86_64.whl (91.0 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

hammingdist-0.13.0-cp37-cp37m-win_amd64.whl (96.3 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

hammingdist-0.13.0-cp37-cp37m-win32.whl (83.4 kB view hashes)

Uploaded CPython 3.7m Windows x86

hammingdist-0.13.0-cp37-cp37m-musllinux_1_1_x86_64.whl (726.3 kB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ x86-64

hammingdist-0.13.0-cp37-cp37m-musllinux_1_1_i686.whl (792.8 kB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ i686

hammingdist-0.13.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (201.5 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

hammingdist-0.13.0-cp37-cp37m-macosx_10_9_x86_64.whl (90.4 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

hammingdist-0.13.0-cp36-cp36m-win_amd64.whl (96.2 kB view hashes)

Uploaded CPython 3.6m Windows x86-64

hammingdist-0.13.0-cp36-cp36m-win32.whl (83.4 kB view hashes)

Uploaded CPython 3.6m Windows x86

hammingdist-0.13.0-cp36-cp36m-musllinux_1_1_x86_64.whl (726.2 kB view hashes)

Uploaded CPython 3.6m musllinux: musl 1.1+ x86-64

hammingdist-0.13.0-cp36-cp36m-musllinux_1_1_i686.whl (792.7 kB view hashes)

Uploaded CPython 3.6m musllinux: musl 1.1+ i686

hammingdist-0.13.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (201.5 kB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64

hammingdist-0.13.0-cp36-cp36m-macosx_10_9_x86_64.whl (90.5 kB view hashes)

Uploaded CPython 3.6m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page