Skip to main content

Zenodo_get - a downloader for Zenodo records

Project description

zenodo_get: a downloader for Zenodo records

AppVeyor:Build status CircleCI:Build status

Coveralls:Coverage Status Codecov:codecov

This is a Python3 tool that can mass-download files from Zenodo records.

pyversion PyPI - License DOI

Source code

The code is hosted at Github.

Installation

It is recommended to use uv for managing Python environments and installing this package. zenodo-get requires Python 3.10 or newer.

  1. The most simple way is to use it as a tool, no installation is needed:

    uv tool run zenodo_get RECORD_ID_OR_DOI
    
  2. Install uv (if you haven't already):

    # On macOS and Linux
    curl -LsSf https://astral.sh/uv/install.sh | sh
    # On Windows
    powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
    
  3. Create a virtual environment and install zenodo-get:

    • From PyPI:
      uv venv
      uv pip install zenodo-get
      source .venv/bin/activate # Or .venv\Scripts\activate on Windows
      
    • Or from a local source checkout:
      uv venv
      uv pip install .
      source .venv/bin/activate # Or .venv\Scripts\activate on Windows
      

Traditional pip installation is also supported:

pip install zenodo-get # Ensure pip is for Python 3.10+

Afterwards, you can query the command line options:

zenodo_get -h

but the default settings should work for most use cases:

zenodo_get RECORD_ID_OR_DOI

Running with uv run

As stated above, the most simple way is to use uv tool run or uvx. If you don't use these, but you use uv: Once your project is set up with uv (either by installing dependencies via uv pip install . or by just having the pyproject.toml present), you can use uv run to execute the zenodo_get command directly without needing to activate the virtual environment in your current shell:

# Example: Show help message
uv run zenodo_get --help

# Example: Download a record (replace YOUR_RECORD_ID)
uv run zenodo_get YOUR_RECORD_ID -o output_directory

# Example: Using a script defined in pyproject.toml (zenodo_get is defined there)
# uv run zenodo_get YOUR_RECORD_ID

Pip and pipx also work.

Documentation

The tool itself is simple, and the help message is reasonable:

zenodo_get -h

but if you need more, open a github ticket and explain what is missing.

Basic usage:

zenodo_get RECORD_ID_OR_DOI

Filtering by File Type

You can use the -g or --glob option to specify file patterns. To download multiple specific file types, provide a comma-separated list of glob patterns:

zenodo_get RECORD_ID_OR_DOI -g "*.txt,*.pdf,images/*.png"

Other Special parameters:

  • -m : generate md5sums.txt for verification. Beware, if md5sums.txt is present in the dataset, it will overwrite this generated file. Verification example: md5sum -c md5sums.txt
  • -w FILE : instead of downloading the record files, it will generate a FILE (or print to stdout if FILE is -) which contains direct links to the Zenodo site. These links could be downloaded with any download manager, e.g. with wget: wget -i urls.txt
  • -e : continue on error. It will skip the files with errors, but it will try to download the rest of the files.
  • -k : keep files: it will keep files with invalid md5 checksum. The main purpose is debugging.
  • -R N: retry on error N times.
  • -p N: Waiting time in sec before retry attempt. Default: 0.5 sec.
  • -n : do not continue. The default behaviour is to download only the files which are not yet download or where the checksum does not match with the file. This flag disables this feature, and it will force download existing files, and assigning a new name to the files (e.g. file(1).ext )

Remark for batch processing: the program always exits with non-zero exit code, if any error has happened, for instance, checksum mismatch, download error, time-out, etc. Only perfectly correct downloads end with 0 exit code.

Citation

You don't really need to cite this software, except if you use it for another academic publication. E.g. if you download something from Zenodo with zenodo-get: no need to cite anything. If you download a lot from Zenodo, and you publish about Zenodo, and my tool is integral part of the methodology, then you could cite it. You could always ask the code to print the most up-to-date reference producing plain text and bibtex references too:

zenodo_get --cite

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zenodo_get-2.0.0.tar.gz (48.4 kB view details)

Uploaded Source

Built Distribution

zenodo_get-2.0.0-py3-none-any.whl (34.4 kB view details)

Uploaded Python 3

File details

Details for the file zenodo_get-2.0.0.tar.gz.

File metadata

  • Download URL: zenodo_get-2.0.0.tar.gz
  • Upload date:
  • Size: 48.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for zenodo_get-2.0.0.tar.gz
Algorithm Hash digest
SHA256 9e17e35b0655f4a1ccbb1c3c9c4d0a7a79a5bdb0ff7d1ee28d5581f69cdc6312
MD5 c2ac99c11b8b5e399ef47f37aafd5945
BLAKE2b-256 8dcb84e1afbc9cdf203afb625dfaf5c8266da1975cb99688f665d6b10098acef

See more details on using hashes here.

File details

Details for the file zenodo_get-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: zenodo_get-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for zenodo_get-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7fdfca926f1aef91b04ed9374050d5d1222e2eddd8e20e735be2b2ce4f55a2da
MD5 f4ed236cdc9837bafdef508b86a8171e
BLAKE2b-256 d30dbb77dbb76191eb9e6abdeb6b167e80c76c985b254b133a793c69897ec764

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page