Skip to main content

High-performance FastSketch with SIMD acceleration to deduplicate large-scale data

Project description

Installation

You can install FastHashSketch using pip. It's available in all platforms:

pip install .

TODO

  • Return NumPy ndarray when input is NumPy ndarray for single-set sketch overloads (np.uint32/np.int32 inputs).

Usage Example

from FastSketchLSH import FastSimilaritySketch

def estimate_jaccard(sketch1, sketch2):
    if len(sketch1) != len(sketch2):
        raise ValueError("Sketches must have the same length to compare.")
    matches = sum(1 for i in range(len(sketch1)) if sketch1[i] == sketch2[i])
    return matches / len(sketch1)

if __name__ == '__main__':
    t = 256
    A = set(range(0, 1000))
    B = set(range(500, 1500))
    sketcher = FastSimilaritySketch(sketch_size=t)
    S_A = sketcher.sketch(A)
    S_B = sketcher.sketch(B)
    est_j = estimate_jaccard(S_A, S_B)
    print(f"Estimated Jaccard: {est_j:.4f}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastsketchlsh-0.1.1.tar.gz (50.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fastsketchlsh-0.1.1-cp312-cp312-win_amd64.whl (140.0 kB view details)

Uploaded CPython 3.12Windows x86-64

fastsketchlsh-0.1.1-cp312-cp312-win32.whl (129.3 kB view details)

Uploaded CPython 3.12Windows x86

fastsketchlsh-0.1.1-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (3.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

fastsketchlsh-0.1.1-cp312-cp312-macosx_14_0_x86_64.whl (485.7 kB view details)

Uploaded CPython 3.12macOS 14.0+ x86-64

fastsketchlsh-0.1.1-cp312-cp312-macosx_14_0_arm64.whl (191.8 kB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

fastsketchlsh-0.1.1-cp311-cp311-win_amd64.whl (139.0 kB view details)

Uploaded CPython 3.11Windows x86-64

fastsketchlsh-0.1.1-cp311-cp311-win32.whl (128.3 kB view details)

Uploaded CPython 3.11Windows x86

fastsketchlsh-0.1.1-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

fastsketchlsh-0.1.1-cp311-cp311-macosx_14_0_x86_64.whl (483.6 kB view details)

Uploaded CPython 3.11macOS 14.0+ x86-64

fastsketchlsh-0.1.1-cp311-cp311-macosx_14_0_arm64.whl (191.2 kB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

File details

Details for the file fastsketchlsh-0.1.1.tar.gz.

File metadata

  • Download URL: fastsketchlsh-0.1.1.tar.gz
  • Upload date:
  • Size: 50.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fastsketchlsh-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4420676ba5533458e597f019b465d92f1b6ed4087019d59661dc211bc1c23c58
MD5 b94b66be3237308e27ca28671cf34fe8
BLAKE2b-256 bcc5e838329fb267984533b9d57b4c243eab023207fdd41c7c5bc04458e2ae2b

See more details on using hashes here.

File details

Details for the file fastsketchlsh-0.1.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for fastsketchlsh-0.1.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 555497466aebe99994acb50d4831bd3066aa945631400e1bd3b1fd6d4c082c9a
MD5 3c3ada59ce299080e1d668d6d56230bb
BLAKE2b-256 604a56cbe874a207861819341318c6ba4c42e1a890d8be846d3b1092d5eda2e7

See more details on using hashes here.

File details

Details for the file fastsketchlsh-0.1.1-cp312-cp312-win32.whl.

File metadata

  • Download URL: fastsketchlsh-0.1.1-cp312-cp312-win32.whl
  • Upload date:
  • Size: 129.3 kB
  • Tags: CPython 3.12, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fastsketchlsh-0.1.1-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 174474c6833cc0f656a04964329ea7807890bb27604e374c8e87c41c92733aab
MD5 a96e83a37e12733cc797717c325ecaed
BLAKE2b-256 7690b629ef786708ebc5693e22a20e8358e38a02ed5b163efc2129f41dfa4ce7

See more details on using hashes here.

File details

Details for the file fastsketchlsh-0.1.1-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fastsketchlsh-0.1.1-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5d6411014d5134f40eea5edd20ac4a1f84bdd0286d8306acf8096ab5e319ca3e
MD5 3763e307e15fda610cecd9e63e98e98d
BLAKE2b-256 adef1496cceede99824935161253ced65cffa76f83a2bea5909c5151ca57ec98

See more details on using hashes here.

File details

Details for the file fastsketchlsh-0.1.1-cp312-cp312-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for fastsketchlsh-0.1.1-cp312-cp312-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 053435c553a7f6371834049ac595dc3cc7c8471f9760544e61452175dcb3885d
MD5 48ab677e39801b994a8794bb8b341ae5
BLAKE2b-256 b1fe3124b04ed7cc4e69d301b73bba1399899d408cca47bde9b1a645eeee9167

See more details on using hashes here.

File details

Details for the file fastsketchlsh-0.1.1-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for fastsketchlsh-0.1.1-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 bf8be3c3f0c0407413c21e354c998fde7b2a166a36baf22cf171b2388380214c
MD5 f771479a98c4ae410fe8d4ce334a7455
BLAKE2b-256 51a79e0248c40d8667d83b471c2ca566ed0cb718a623fde223fc6b86e889e9b4

See more details on using hashes here.

File details

Details for the file fastsketchlsh-0.1.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for fastsketchlsh-0.1.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 440f6754a6cc6938834fc58bc6d472796e3d4de6127b4fd949ca81d25bdd8104
MD5 327403157e49883f982bc890d7b8d9a6
BLAKE2b-256 791550e8b9534e64492bc7541d71b8ca186a31f08bd004df99a77ecbafff8e82

See more details on using hashes here.

File details

Details for the file fastsketchlsh-0.1.1-cp311-cp311-win32.whl.

File metadata

  • Download URL: fastsketchlsh-0.1.1-cp311-cp311-win32.whl
  • Upload date:
  • Size: 128.3 kB
  • Tags: CPython 3.11, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fastsketchlsh-0.1.1-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 d7d4c107de9254d528f057d28e34f8f07bf5f0508312cb5c16988a49aa1e27ce
MD5 2bedd332901cca158c94b068bac45008
BLAKE2b-256 4285221e2d42fcf802253d6cc768eafe29140780605d9ffd175c8cd74cb50ae1

See more details on using hashes here.

File details

Details for the file fastsketchlsh-0.1.1-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fastsketchlsh-0.1.1-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 718f4c2832fc9fdfb79836126f5e3226482cef1df095bc3bed9390502a795286
MD5 6f7ec9cdf8b7af5db2c521328049a0f2
BLAKE2b-256 e1fd88fc2517e4c26ec613169c2498c61f68ef2baadf1b7996da6a8e3360b9cc

See more details on using hashes here.

File details

Details for the file fastsketchlsh-0.1.1-cp311-cp311-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for fastsketchlsh-0.1.1-cp311-cp311-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 f76a4477fa07d203f5f50cddeb576eb80bfbd1914abd04630ac4dba93abfb146
MD5 d4a320d7fb8af753a2e1cccd4af99754
BLAKE2b-256 3e0cb6d8fe94a1a2000865f8a21d55e9b1ebd8a5e50503acf1e671c99b17e9a4

See more details on using hashes here.

File details

Details for the file fastsketchlsh-0.1.1-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for fastsketchlsh-0.1.1-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 e55a2222b98f85a0ee1afa928122eedbf88716f9060b7a8faed621bc244f58b4
MD5 c7281da20499cd45d84dc1eebd3a5dfb
BLAKE2b-256 ab323d80c9ee7da0c52f37ea6bf542b21745d9004e2a5c806888ed1b7dacf544

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page