Download and extract (single) files from ZIP archives in Zenodo records.
Project description
ZenodoZipDownloader
A Python package to download and extract selected files from ZIP archives in Zenodo records, with pattern matching, CRC32 integrity checks, and robust retry logic.
Features
- Download files from Zenodo records using DOI or record URL
- Filter ZIP files and inner files using glob patterns
- Download only a subset of files if desired
- CRC32 integrity check for downloaded files
- Retry logic for network and file operations
- Command-line interface (CLI)
Installation
pip install zenodozipdownloader
Or from source:
git clone https://github.com/yourusername/zenodozipdownloader.git
cd zenodozipdownloader
pip install .
Requirements
- Python 3.10 or newer
- Network access to
https://zenodo.org/ - Optional:
tqdmfor download progress (see below)
Optional progress bar
The package can show download progress if tqdm is available. Install it alongside the downloader with:
pip install zenodozipdownloader[tqdm]
Usage
As a Python module
# [optional] set logging level to INFO for more verbose output
import logging
logging.basicConfig(level=logging.INFO)
from zenodozipdownloader import ZenodoZipDownloader
doi = "10.5281/zenodo.5423457"
downloader = ZenodoZipDownloader(doi, download_dir="zenodo_downloads")
downloaded_files = downloader.download(zip_pattern="*.zip", inner_pattern="*tubulin*.mat")
for path in downloaded_files:
print(f"Saved: {path}")
By default:
- Extracted files stay inside the chosen
download_dir; unsafe ZIP paths are skipped. - All files are downloaded unless you provide
inner_pattern. - Downloads retry up to three times and validate CRC32 checksums.
As a CLI tool
zenodozipdownloader 10.5281/zenodo.5423457 --download_dir zenodo_downloads --zip_pattern "*.zip" --inner_pattern "*tubulin*.mat"
Run zenodozipdownloader --help for the full CLI reference. It shares defaults with the Python API; omit --first_n_zip and --first_n_inner to process every match.
Arguments
doi: Zenodo DOI or record URL--download_dir: Directory to save downloaded files (default: current directory)--zip_pattern: Glob pattern for ZIP files in the record (default:*.zip)--inner_pattern: Glob pattern for files inside ZIPs (default: all)--first_n_zip: Only process the first N matching ZIP files (default: all)--first_n_inner: Only download the first N matching files within each ZIP (default: all)
Testing
To run the integration tests and check code coverage:
pip install pytest pytest-cov
pytest --cov=zenodozipdownloader tests/
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zenodozipdownloader-0.1.0a1.tar.gz.
File metadata
- Download URL: zenodozipdownloader-0.1.0a1.tar.gz
- Upload date:
- Size: 11.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab9a7c8e5c98fd85ef4e4fe81531f697983d2b339e6c27fefc16789e2ba070e2
|
|
| MD5 |
2d3fae96d3ec76346ce9f6281ffee7a1
|
|
| BLAKE2b-256 |
259158e09f5918810b5e4ce1e3a474608588747d0070eab4c7822c3f34f4c52c
|
Provenance
The following attestation bundles were made for zenodozipdownloader-0.1.0a1.tar.gz:
Publisher:
publish.yml on thielec/ZenodoZipDownloader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
zenodozipdownloader-0.1.0a1.tar.gz -
Subject digest:
ab9a7c8e5c98fd85ef4e4fe81531f697983d2b339e6c27fefc16789e2ba070e2 - Sigstore transparency entry: 780908788
- Sigstore integration time:
-
Permalink:
thielec/ZenodoZipDownloader@4c4e1e4fd60e549b2ec1e2837f41504a8ab43afc -
Branch / Tag:
refs/tags/v0.1.0a1 - Owner: https://github.com/thielec
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4c4e1e4fd60e549b2ec1e2837f41504a8ab43afc -
Trigger Event:
push
-
Statement type:
File details
Details for the file zenodozipdownloader-0.1.0a1-py3-none-any.whl.
File metadata
- Download URL: zenodozipdownloader-0.1.0a1-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a518bd0b56e05b453f14c6877d7dc53a2d4ee4eb77ac30fdef9c05317a41656
|
|
| MD5 |
6d7b6200575ed4f4674f03af540a2149
|
|
| BLAKE2b-256 |
e08d427f3dcad4f223003078d60a71cf539ba41cb593cb2ffd68bb3f64299c61
|
Provenance
The following attestation bundles were made for zenodozipdownloader-0.1.0a1-py3-none-any.whl:
Publisher:
publish.yml on thielec/ZenodoZipDownloader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
zenodozipdownloader-0.1.0a1-py3-none-any.whl -
Subject digest:
6a518bd0b56e05b453f14c6877d7dc53a2d4ee4eb77ac30fdef9c05317a41656 - Sigstore transparency entry: 780908791
- Sigstore integration time:
-
Permalink:
thielec/ZenodoZipDownloader@4c4e1e4fd60e549b2ec1e2837f41504a8ab43afc -
Branch / Tag:
refs/tags/v0.1.0a1 - Owner: https://github.com/thielec
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4c4e1e4fd60e549b2ec1e2837f41504a8ab43afc -
Trigger Event:
push
-
Statement type: