Skip to main content

Remote Archiver: safely collect output files into archives on network filesystem

Project description

ReAr

PyPI version

Remote Archiver: safely collect output files into archives on network filesystem

Replacement of open() for scenario where multiple processes generate lots of (log) files on a network filesystem. ReAr redirects the writes to Zip files to reduce the stress on the filesystem and to keep things organized. Writing to archive is chunked and staged to avoid single point of failure.

# On each worker:
async with rear_fs("/path/to/archive_base"):
    with rear_open("ar.zip/relpath/to/file", 'w+b') as f: # open a read-write buffer ...
    #with rear_pickup("/path/to/temp-file", "ar.zip/relpath/to/file"): # ... or pick up a file created by others
        f.write(b"...")
    # The file is written to a tmp archive on closing.
    # It will then be moved and eventually stored as `relpath/to/file` in zip file `/path/to/archive_base/ar.zip`.

To avoid concurrent write, each worker writes to a temporary Zip file, and they create a new one every 5 minutes. Run a scavenger to collect the files in the temporary archives into the final archives:

# On your main process:
async with scavengerd("/path/to/archive_base"):
    ...
# ... or to do it manually
while :; do
    rear-scavenger -d /path/to/archive_base
    sleep 5m
done

FAQ

What happens if a worker instance crashes?

Its current temporary archive will end up missing the central directory list as it is not properly closed. Scavenger will try to recover the files as much as possible (with zip -FF).

How does the scavenger works?

Multiple processes cannot write to one Zip file at the same time, so each first deposit the files to individual temporary Zip files and record where those files should be saved eventually. When a temporary Zip file is closed (after the process exit or after 5 minutes), Scavenger copies all files to their destination Zip files. Scavenger does not need to watch for incoming files actively since it can organize them any time after they are saved to the temporary Zip files. It is also safe to run multiple Scavenger instances at any time: it will check if it is necessary before performing any action.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rear-0.1.2.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

rear-0.1.2-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file rear-0.1.2.tar.gz.

File metadata

  • Download URL: rear-0.1.2.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 CPython/3.10.4

File hashes

Hashes for rear-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d350e4685ea9e2bed68766b8126db1cfb1071340511c411b4ada1a423e2c68ad
MD5 88cef8694b02d33ae7521309e89bf97b
BLAKE2b-256 a324a2e075bbcf2f16c5c724a4226b0684772ca897f6ccb11ce5fc8c03b8ad05

See more details on using hashes here.

File details

Details for the file rear-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: rear-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 CPython/3.10.4

File hashes

Hashes for rear-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7872902002c7e328c2bb11c7cdea5dc3c7c71bcb347e8dfea279637dd856e89f
MD5 4276b568b048ccdddc35e44a727d2830
BLAKE2b-256 cb0b658b4588a5dcbc0726288308754add8b89c1bfa30ccb5d2a9b17472ede24

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page