Skip to main content

Access zip file content hosted remotely without downloading the full file.

Project description

remotezip

Build Status

This module provides a way to access single members of a zip file archive without downloading the full content from a remote web server. For this library to work, the web server hosting the archive needs to support the range header.

Installation

pip install remotezip

Usage

Initialization

RemoteZip(url, ...)

To download the content, this library rely on the requests module. The constructor interface matches the function requests.get module.

  • url: Url where the zip file is located (required).
  • auth: authentication credentials.
  • headers: headers to pass to the request.
  • timeout: timeout for the request.
  • verify: enable/disable certificate verification or set custom certificates location.
  • ... Please look at the requests documentation for futher usage details.
  • initial_buffer_size: How much data (in bytes) to fetch during the first connection to download the zip file central directory. If your zip file conteins a lot of files, would be a good idea to increase this parameter in order to avoid the need for further remote requests. Default: 64kb.
  • session: a custom session object to use for the request.
  • support_suffix_range: You can set this attribute to False if the remote server doesn't support suffix range (negative offset). Notice that this option will use one more HEAD request to fetch the content length.

Class Interface

RemoteZip is a subclass of the python standard library class zipfile.ZipFile, so it supports all its read methods:

  • RemoteZip.close()
  • RemoteZip.getinfo(name)
  • RemoteZip.extract(member[, path[, pwd]])
  • RemoteZip.extractall([path[, members[, pwd]]])
  • RemoteZip.infolist()
  • RemoteZip.namelist()
  • RemoteZip.open(name[, mode[, pwd]])
  • RemoteZip.printdir()
  • RemoteZip.read(name[, pwd])
  • RemoteZip.testzip()
  • RemoteZip.filename
  • RemoteZip.debug
  • RemoteZip.comment

Please look at the zipfile documentation for usage details.

NOTE:

  • extractall() and testzip() require to access the full content of the archive. If you need to use such methods, a full download of it would be probably more efficient.
  • RemoteZip.open() now supports seek operations when reading archive members. However as the content is streamed and DEFLATE format doesn't support seek natively, any negative seek operation will result in a new remote request from the beginning of the member content. This is very inefficient, the recommandation is to use RemoteZip.extract() and then open and operate on the extracted file.

Examples

List members in archive

Print all members part of the archive:

from remotezip import RemoteZip

with RemoteZip('http://.../myfile.zip') as zip:
    for zip_info in zip.infolist():
        print(zip_info.filename)

Download a member

The following example will extract the file somefile.txt from the archive stored at the url http://.../myfile.zip.

from remotezip import RemoteZip

with RemoteZip('http://.../myfile.zip') as zip:
    zip.extract('somefile.txt')

S3 example

If you are trying to download a member from a zip archive hosted on S3 you can use the aws-requests-auth library for that as follow:

from aws_requests_auth.boto_utils import BotoAWSRequestsAuth
from hashlib import sha256

auth = BotoAWSRequestsAuth(
    aws_host='s3-eu-west-1.amazonaws.com',
    aws_region='eu-west-1',
    aws_service='s3'
)
headers = {'x-amz-content-sha256': sha256('').hexdigest()}
url = "https://s3-eu-west-1.amazonaws.com/.../file.zip"

with RemoteZip(url, auth=auth, headers=headers) as z: 
    zip.extract('somefile.txt')

Command line tool

A simple command line tool is included in this distribution.

usage: remotezip [-h] [-l] [-d DIR] url [filename [filename ...]]

Unzip remote files

positional arguments:
  url                Url of the zip archive
  filename           File to extract

optional arguments:
  -h, --help         show this help message and exit
  -l, --list         List files in the archive
  -d DIR, --dir DIR  Extract directory, default current directory

Example

$ remotezip -l "http://thematicmapping.org/downloads/TM_WORLD_BORDERS-0.3.zip"
  Length  DateTime             Name
--------  -------------------  ------------------------
    2962  2008-07-30 13:58:46  Readme.txt
   24740  2008-07-30 12:16:46  TM_WORLD_BORDERS-0.3.dbf
     145  2008-03-12 13:11:54  TM_WORLD_BORDERS-0.3.prj
 6478464  2008-07-30 12:16:46  TM_WORLD_BORDERS-0.3.shp
    2068  2008-07-30 12:16:46  TM_WORLD_BORDERS-0.3.shx
    
$ remotezip "http://thematicmapping.org/downloads/TM_WORLD_BORDERS-0.3.zip" Readme.txt
Extracting Readme.txt...

How it works

This module uses the zipfile.ZipFile class under the hood to decode the zip file format. The ZipFile class is initialized with a file like object that will perform transparently the remote queries.

The zip format is composed by the content of each compressed member followed by the central directory.

How many requests will this module perform to download a member?

  • If the full archive content is smaller than initial_buffer_size, only one request will be needed.
  • Normally two requests are needed, one to download the central directory and one to download the archive member.
  • If the central directory is bigger than initial_buffer_size, a third request will be required.
  • If negative seek operations are used in ZipExtFile, each of them will result in a new request.

Alternative modules

There is a similar module available for python pyremotezip.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

remotezip-0.12.3.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

remotezip-0.12.3-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file remotezip-0.12.3.tar.gz.

File metadata

  • Download URL: remotezip-0.12.3.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.0

File hashes

Hashes for remotezip-0.12.3.tar.gz
Algorithm Hash digest
SHA256 bf1ebe2be9f07a6e1c14d0e52ecffccd7a3e808dff4f9ba523c5e84d867a3fe3
MD5 dbea84e79da29adb59a387ec7d2e75ba
BLAKE2b-256 fa8d908ad46bff752568a409ee6ac797c3c6817501db06f142989e3208414569

See more details on using hashes here.

File details

Details for the file remotezip-0.12.3-py3-none-any.whl.

File metadata

  • Download URL: remotezip-0.12.3-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.0

File hashes

Hashes for remotezip-0.12.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f70a4026879439ecb0a2cf848a7c176ae5ee142bbe51ec69e3344e150b2a52de
MD5 7eff4faf02b40cf1b53e699b0cae02cf
BLAKE2b-256 7b1822316545b712dbed0119c7d3b8683a566c7da26353e344a4188b99f12692

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page