Skip to main content

Software Heritage Datastore Scrubber

Project description

Tools to periodically checks data integrity in swh-storage and swh-objstorage, reports errors, and (try to) fix them.

This is a work in progress; some of the components described below do not exist yet (cassandra storage checker, objstorage checker, recovery, and reinjection)

The Scrubber package is made of the following parts:

Checking

Highly parallel processes continuously read objects from a data store, compute checksums, and write any failure in a database, along with the data of the corrupt object.

There is one “checker” for each datastore package: storage (postgresql and cassandra), journal (kafka), and objstorage.

Recovery

Then, from time to time, jobs go through the list of known corrupt objects, and try to recover the original objects, through various means:

  • Brute-forcing variations until they match their checksum

  • Recovering from another data store

  • As a last resort, recovering from known origins, if any

Reinjection

Finally, when an original object is recovered, it is reinjected in the original data store, replacing the corrupt one.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swh.scrubber-0.0.1.tar.gz (29.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swh.scrubber-0.0.1-py3-none-any.whl (32.1 kB view details)

Uploaded Python 3

File details

Details for the file swh.scrubber-0.0.1.tar.gz.

File metadata

  • Download URL: swh.scrubber-0.0.1.tar.gz
  • Upload date:
  • Size: 29.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/17.1.1 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.3

File hashes

Hashes for swh.scrubber-0.0.1.tar.gz
Algorithm Hash digest
SHA256 abacca21f54b34f05605c4f4d3313f4f343ffedf9a681611588b8c16fe4d214f
MD5 1084b61336b412788c5d3530e6c0b369
BLAKE2b-256 0e82bae335a6eb2be2333eb2e346eb38941432cb965da8da7614c9bd0f682253

See more details on using hashes here.

File details

Details for the file swh.scrubber-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: swh.scrubber-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 32.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/17.1.1 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.3

File hashes

Hashes for swh.scrubber-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7e2bc5e08c6ae520ab32307067f3e315ee6aa49cf6502a2677df12c0f4ccf460
MD5 270a327f58e6df2971931aa3ba0e23b2
BLAKE2b-256 b1168ac6ba9e630bf66bbb890bf94d4c06c831c9f298aecbdcd55d433d06774e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page