Skip to main content

URL downloader supporting checkpointing and continuous checksumming.

Project description

best-download

python badge
URL downloader supporting checkpointing and continuous checksumming.

NOTE: When the local_file already exists we automatically overwrite unless there is a checkpoint file there. When the download successfully completes the checkpoint will be deleted and True returned. This avoids leaving rubbish in the file system or doing full checksum calculations for large files. You will need to manage existing files if your scripts are re-runnable, either maintain your own database/done files or do a manual checksum.

Recent Updates:

  1. Added multiple urls option for failover.
  2. Parameter changes to 'download_file'
  • local_file is now optional, and will be set to the url basepath if not provided
  • Added local_directory option, will be prepended to local_file. Mainly useful for downloading to a directory and using automatic local_file
  1. Improved SIGINT handling. We now raise a KeyboardInterrupt exception after handling it safely internally.
  2. Added a decent set of tests:
pip install -r requirements-dev.txt
pytest

Install

pip install best-download

Quickstart

from best_download import download_file

url = "http://ipv4.download.thinkbroadband.com/10MB.zip"
checksum = "d076d819249a9827c8a035bb059498bf49f391a989a1f7e166bc70d028025135"
local_file = "10MB.zip"
try:
  success = download_file(url, local_file=local_file, expected_checksum=checksum)
except KeyboardInterrupt:
  print("Ctrl-C (SIGINT) is passed up")

API

There's only one entry point:

def download_file(urls, expected_checksum=None, local_file=None, local_directory=None, max_retries=3)
Parameter Description
urls Either a single url or a list of urls to iterate over if failover required.
expected_checksum (Optional) Checksum to validate against after download complete. Will not validate if not provided.
local_file (Optional) Output path for saving the file. If not provided we default to the url basepath.
local_directory (Optional) If provided will be prepended to local_file. Mainly useful for downloading to a directory and using automatic local_file.
max_retries (Default: 3) Number of retry attmpts per url (per failover if list is provided).

Examples

The following example can be found in "examples/basic_example.py". There are some example urls in the tests array, including test cases for a server not supporting ranges (github) and a server defaulting to gzip encoding which we don't use. We demo resuming at the end.

import os
from best_download import download_file

import logging
logger = logging.getLogger()
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.INFO)
logger.addHandler(console_handler)
logger.setLevel(logging.INFO)

tests = []
tests.append(("http://ipv4.download.thinkbroadband.com/10MB.zip", "10MB.zip",
    "d076d819249a9827c8a035bb059498bf49f391a989a1f7e166bc70d028025135"))

# Larger file used for cancel test
tests.append(("http://ipv4.download.thinkbroadband.com/100MB.zip", "100MB.zip",
    "cc844cac4b2310321d0fd1f9945520e2c08a95cefd6b828d78cdf306b4990b3a"))

# Github example doesn't support resuming
tests.append(("https://github.com/Nealcly/MuTual/archive/master.zip", "master.zip", None))

# Testing Accept-Encoding: identity (no gzip)
tests.append(("https://raw.githubusercontent.com/openai/gpt-3/master/data/two_digit_addition.jsonl",
             "two_digit_addition.jsonl", "75a54b7a3db3b23369df74fe440c23025f3d3c51f664300bd3d56632b2617b3d"))

def main():
    logger.info("Commence Demo")
    url, local_file, checksum = tests[0]

    # local_file provided
    logger.info(f"\nTesting download of file {url} to {local_file}")
    logger.info("-----------------------------------------------------------------------")
    download_file(url, local_file=local_file, expected_checksum=checksum)
    assert os.path.exists(local_file)
    os.remove(local_file)

    # local_file automatically discovered from url basepath    
    logger.info(f"\nTesting download of file {url} to {local_file} without providing local_file")
    logger.info("-----------------------------------------------------------------------")    
    download_file(url, expected_checksum=checksum)
    assert os.path.exists(local_file)
    os.remove(local_file)

    # local_directory provided
    local_directory = "testing_download"
    local_file_path = os.path.join(local_directory, local_file)
    logger.info(f"\nTesting download of file {url} to {local_file_path}")
    logger.info("-----------------------------------------------------------------------")    
    download_file(url, expected_checksum=checksum, local_file=local_file, local_directory=local_directory)
    assert os.path.exists(local_file_path)
    os.remove(local_file_path)
    os.rmdir(local_directory)

    # local_directory provided + local_file automatically discovered from url basepath
    local_directory = "testing_download"
    local_file_path = os.path.join(local_directory, local_file)
    logger.info(f"\nTesting download of file {url} to {local_file_path} without providing local_file")
    logger.info("-----------------------------------------------------------------------")    
    download_file(url, expected_checksum=checksum, local_directory=local_directory)
    assert os.path.exists(local_file_path)
    os.remove(local_file_path)
    os.rmdir(local_directory)

    # Resume Test    
    logger.info("\nResume Test")
    logger.info("-----------------------------------------------------------------------")
    url, local_file, checksum = tests[1]
    logger.info("Please cancel half way through and re-run this example to test resuming")
    try:
        download_file(url, local_file=local_file, expected_checksum=checksum)
    except KeyboardInterrupt:
        pass
    logger.info("Attempting resume if you cancelled in time.")
    download_file(url, local_file=local_file, expected_checksum=checksum)
    assert os.path.exists(local_file)
    os.remove(local_file)

if __name__ == '__main__':
    main()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

best-download-0.1.2.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

best_download-0.1.2-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file best-download-0.1.2.tar.gz.

File metadata

  • Download URL: best-download-0.1.2.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for best-download-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b68c90bc0c947b30a7d2e1d4d69fb6d9fcb56e636ab33c372ea9969db5f27485
MD5 b33f5f9c7771806af73c8236fca9ad95
BLAKE2b-256 3885a519b68560a544bbfbe51e0a288fec9b3c173ec54a10b4c0852357c86a91

See more details on using hashes here.

File details

Details for the file best_download-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: best_download-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for best_download-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 25f25c14bb356a8dd3455fa3778c56cf2af3b515f2e2b841ebfdcaf42412d9ff
MD5 f6284093d841f1e9759d670b28b9dd19
BLAKE2b-256 6e822a8ad5723cf08f0cfe4f012887185c58945e15e53257f8729d108f8ec2bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page