Skip to main content

Software Heritage Shard File Format

Project description

This module implement the support and tooling to manipulate SWH Shard files based on a perfect hash table, typically used by the software heritage object storage.

It is both a Python extension that can be used as a library to manuipulate SWH shard files, and a set of command line tools.

Quick Start

This packages uses pybind11 to build the wrapper around the cmph minimal perfect hashmap library. To build the binary extension, in addition to the python development tools, you will need cmph, gtest and valgrind. On de Debian system, you can install these using:

sudo apt install build-essential python3-dev libcmph-dev libgtest-dev valgrind lcov

Command Line Tool

You may use several methods to install swh-shard, e.g. using uv or pip.

For example:

$ uv tool install swh-shard
[...]
Installed 1 executable: swh-shard

$ swh-shard
Usage: swh-shard [OPTIONS] COMMAND [ARGS]...

  Software Heritage Shard tools.

Options:
  -C, --config-file FILE  Configuration file.
  -h, --help              Show this message and exit.

Commands:
  create  Create a shard file from given files
  get     List objects in a shard file
  info    Display shard file information
  ls      List objects in a shard file

Then you can create a shard file from local files:

$ swh-shard create volume.shard *.py
There are 3 entries
Checking files to add  [####################################]  100%
after deduplication: 3 entries
Adding files to the shard  [####################################]  100%
Done

This will use the sha256 checksum of each file content given as argument as key in the shard file.

Then you can check the header of the shard file:

$ swh-shard info volume.shard
Shard volume.shard
├─version:    1
├─objects:    3
│ ├─position: 512
│ └─size:     5633
├─index
│ ├─position: 6145
│ └─size:     440
└─hash
  └─position: 6585

List the content of a shard:

$ swh-shard ls volume.shard
8bb71bce4885c526bb4114295f5b2b9a23a50e4a8d554c17418d1874b1a233ac: 834 bytes
06340a7a5fa9e18d72a587a69e4dc7e79f4d6a56632ea6900c22575dc207b07f: 4210 bytes
d39790a3af51286d2d10d73e72e2447cf97b149ff2d8e275b200a1ee33e4a3c5: 565 bytes

Retrieve an object from a shard:

$ swh-shard get volume.shard 06340a7a5fa9e18d72a587a69e4dc7e79f4d6a56632ea6900c22575dc207b07f | sha256sum
06340a7a5fa9e18d72a587a69e4dc7e79f4d6a56632ea6900c22575dc207b07f  -

And delete one or more objects from a shard:

$ swh-shard delete volume.shard 06340a7a5fa9e18d72a587a69e4dc7e79f4d6a56632ea6900c22575dc207b07f
About to remove these objects from the shard file misc/volume.shard
06340a7a5fa9e18d72a587a69e4dc7e79f4d6a56632ea6900c22575dc207b07f (4210 bytes)
Proceed? [y/N]: y
Deleting objects from the shard  [####################################]  100%
Done

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swh_shard-2.2.0.tar.gz (42.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

swh_shard-2.2.0-pp311-pypy311_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (180.6 kB view details)

Uploaded PyPymanylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

swh_shard-2.2.0-pp310-pypy310_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (179.5 kB view details)

Uploaded PyPymanylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

swh_shard-2.2.0-cp314-cp314t-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.14tmusllinux: musl 1.2+ x86-64

swh_shard-2.2.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (184.6 kB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

swh_shard-2.2.0-cp314-cp314-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.14musllinux: musl 1.2+ x86-64

swh_shard-2.2.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (185.5 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

swh_shard-2.2.0-cp313-cp313-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13musllinux: musl 1.2+ x86-64

swh_shard-2.2.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (185.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

swh_shard-2.2.0-cp312-cp312-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

swh_shard-2.2.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (183.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

swh_shard-2.2.0-cp311-cp311-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

swh_shard-2.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (181.7 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

swh_shard-2.2.0-cp310-cp310-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10musllinux: musl 1.2+ x86-64

swh_shard-2.2.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (180.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file swh_shard-2.2.0.tar.gz.

File metadata

  • Download URL: swh_shard-2.2.0.tar.gz
  • Upload date:
  • Size: 42.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for swh_shard-2.2.0.tar.gz
Algorithm Hash digest
SHA256 b9c8e4b5d8b0e2c93fc135f76257ee3f8fb37b8e3c954d22f852adc7074c5d0b
MD5 7ffe56364a75ae3bc2974d7c6d2085c8
BLAKE2b-256 b91b95f4d4839c5cbb555283e1129baa0ca41697ec705799dd6ab7d0000b595b

See more details on using hashes here.

File details

Details for the file swh_shard-2.2.0-pp311-pypy311_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for swh_shard-2.2.0-pp311-pypy311_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 14777a3416049f5ee6801cfd6bb96dc72d968b37f90142d2749b2961ef5d4212
MD5 612fc77647129e4da369981227b531ce
BLAKE2b-256 dd94f4acf09528734e340e08df72a58007a891d8654b1f6667fbe5b47ce2af22

See more details on using hashes here.

File details

Details for the file swh_shard-2.2.0-pp310-pypy310_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for swh_shard-2.2.0-pp310-pypy310_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3d11a23e6f7fdce5b70b584711d0fe210bb05288cc7a3a5b1a7db63209bd4a76
MD5 ae2d02c63c2a8e6c4b9a75c8123145bc
BLAKE2b-256 10c28c4c90d8046d6695d7df2912bd611baf230f05801678e046f2971d95736d

See more details on using hashes here.

File details

Details for the file swh_shard-2.2.0-cp314-cp314t-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for swh_shard-2.2.0-cp314-cp314t-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 7af05c9b90f767b460dee803b6fbbc99a2565e5002c427169672796fc0066dbc
MD5 087c11167a7339d733fa8e0e1b7d9e47
BLAKE2b-256 16865677778c74a3e98c90b3aa3f3e6f24fb81e9a8180d015edd9d88ce5820a7

See more details on using hashes here.

File details

Details for the file swh_shard-2.2.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for swh_shard-2.2.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8f7b4f4a20ec7177821bc59aa450e963151aa81e45417132db904e049ac83a7f
MD5 7c20860941f04e7ef7093efe4766dd30
BLAKE2b-256 fe0bbdebebc005f91dcf9d9e4567e01267404176c53886bd479ddc9ac1d0e68e

See more details on using hashes here.

File details

Details for the file swh_shard-2.2.0-cp314-cp314-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for swh_shard-2.2.0-cp314-cp314-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 860aea939bc528fbe487505d5719f7c3890787ee959bcdeb177e9067a20359b3
MD5 f52d0ffacdffd907d0254166893949f1
BLAKE2b-256 e272cfa386f427f17aed18e943a35bfb08ab732ea31bc4fce2abd75dd9f97945

See more details on using hashes here.

File details

Details for the file swh_shard-2.2.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for swh_shard-2.2.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 aa46a4af05337b8949f82e163a4cd4f457af3205ca7274ee87cdd4bb556cb511
MD5 6b9224a0bf273eb75f77e24becd9c6e9
BLAKE2b-256 ea148d8dc7b05ba27cd6dae57527e17843306b364a09de6ccd266af40a674b23

See more details on using hashes here.

File details

Details for the file swh_shard-2.2.0-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for swh_shard-2.2.0-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 8e09d48709e6ad87624a99c491d543dd45a49e299da572b1c8ff9fca65cc31f0
MD5 11d8316402949ca979bb95ae4c25f21a
BLAKE2b-256 994f6499391402bbafe820f03ebf61e541f015ed72868f43a55f5d096cd43908

See more details on using hashes here.

File details

Details for the file swh_shard-2.2.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for swh_shard-2.2.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 972ecc6fa92c7ccd0d2f80cc2bc388d4f3a8f6bf936233a9ce277403264135bd
MD5 00707e0470093dcb784bdccc23310530
BLAKE2b-256 59b3ab52da736930d269931ef7410816a10fa100acc624c1a54f77d3248e6d6a

See more details on using hashes here.

File details

Details for the file swh_shard-2.2.0-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for swh_shard-2.2.0-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 e5d1733ac2f2ef7c826e4d329e92b6b4422860b203a2d2d9ca231b1e61c9fe86
MD5 b799faa7308faee664bceb99ccff3ae5
BLAKE2b-256 1740c25560bc4d0ad78474c971345a20758b71599bae6b7a0b37b3b7d86b68c9

See more details on using hashes here.

File details

Details for the file swh_shard-2.2.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for swh_shard-2.2.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 203c6257d6c25a77835af3032123034131ed90f849c9cfb62c91b9de6bc39a19
MD5 89e7a6afc1fc9f861255fe3f98386769
BLAKE2b-256 791a8c69aa4839cd83fed5152262a87330ee71ceb12678bf2f655c9da676dcb3

See more details on using hashes here.

File details

Details for the file swh_shard-2.2.0-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for swh_shard-2.2.0-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 5e3f972333afdbd82242eca0dce720ef767ea8c6295c252a42a2ab21d3521fbf
MD5 5cfb845a3360b9a0ef1b5ff89792cec2
BLAKE2b-256 4a5e468f114783282023110efcc8dc4b7c129a41841cbb9a53802646c908966c

See more details on using hashes here.

File details

Details for the file swh_shard-2.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for swh_shard-2.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2846ebbeafb407dee2df97c7183228efb930f8b465c509b76f5136029c394908
MD5 27b4906dd8fa65b41addd13657079f54
BLAKE2b-256 c450b1d78ff5f4ec9cb7078671678dd693ddba2c5cf42eb7c769dabb1c42fbe2

See more details on using hashes here.

File details

Details for the file swh_shard-2.2.0-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for swh_shard-2.2.0-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 489abf06a588e293e29b491275d4f70f6ff172d16f815af987261fb722d26635
MD5 ccb5463e977b1921a27bd20ccd8c1a2c
BLAKE2b-256 a65ba7dce94ab94a88411f55e08293dc135126150a687198299706489887b9f6

See more details on using hashes here.

File details

Details for the file swh_shard-2.2.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for swh_shard-2.2.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b30f778e37c39367bad8566a3e7a6fbd61580ae11738806bb9a931cf581df546
MD5 8f5e8c046bb0a32e60687bb98cb4bac5
BLAKE2b-256 901dc40c8d5320cf066147c81589778900d1aaf46b8ca99918cd043c405f7457

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page