Skip to main content

A Python library for reading and writing SAR files.

Project description

sarfile

Like tarfile, but streamable.

What is this?

This repository implements a "streaming archive" file format for collecting multiple files into one. This is similar to the TAR format, but it puts the information about all the files in the archive into a contiguous block at the beginning of the file. This solves a couple problems:

  1. Much faster startup times for large archives (we read the entire header into memory in one go)
  2. Much friendlier to remote file systems (only one network request rather than a bunch), in combination with smart_open
  3. Fast random access

The file size is the same as an uncompressed TAR file.

The downside is that once we've written a SAR file, we can't change it. Maybe future formats will support this, but for now, the recommended flow is to first generate a TAR file, then convert it using the builtin sarpack command line tool or the sarfile.pack_tar Python API.

Also, the file format only exists in this repository, although it's very simple to implement (see the _header.py documentation and the sarfile object for how to load items).

Getting Started

Install the package using Pip:

pip install sarfile

Next, simply import the module:

import sarfile

You can convert a tarfile to a sarfile using the Python API:

sarfile.pack_tar(out="myfile.sar", tar="myfile.tar")

Alternatively, you can use the built-in command line tool:

sarpack myfile.sar myfile.tar

Finally, the file can be used in your Python script:

f = sarfile.open("myfile.sar"):
print(f.names)
with f["myfile.txt"] as myfile:
    print(myfile.read())

If you have installed smart_open, then you can also read from S3 as follows:

f = sarfile.open("myfile.sar")
print(f.names)
with f["myfile.txt"] as myfile:
    print(myfile.read())

The above code is much faster than reading a TAR file from S3, because we read the entire header into memory in one network request, rather than having to make a network request for each file in the archive. On subsequent accesses we also only download the part of the file we want to read.

Requirements

This package is tested against Python 3.10. Although not required, it is a good idea to install smart_open to support reading from S3 or other remote file systems, and tqdm to show a progress bar when packing large files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sarfile-0.1.5.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

sarfile-0.1.5-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file sarfile-0.1.5.tar.gz.

File metadata

  • Download URL: sarfile-0.1.5.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for sarfile-0.1.5.tar.gz
Algorithm Hash digest
SHA256 a53eab6738a57faf27c01a1dc34d3ab4a4ea74078ce438f9ea7bdf15795216b8
MD5 5d9124ed44fde589d6b51c4f44dd1538
BLAKE2b-256 1378718acc67f1793507f02141bb1a42de25e9d2c13e00417c0cfa16fb1e8cae

See more details on using hashes here.

File details

Details for the file sarfile-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: sarfile-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for sarfile-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 1649b28c2f77652bf7ac163f85fcac60dbb7f3f7a7cd51249cb9fd6ea76da4c0
MD5 2c9a1fcaa88aa9f3da6d0663f1d659f1
BLAKE2b-256 019b7ee20b17ae449955f0260c5217be037e7dbade5082ca70b1563c77e04e88

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page