Skip to main content

Download manager for Python

Project description

Tests Coverage

A Download Manager Python Module

A flexible, cache-aware download manager for Python, supporting multiple backends (requests, pycurl), with integrated caching and metadata management.


Features

  • Multiple Backends: Choose between requests and pycurl for downloads.
  • Cache Integration: Seamless integration with cache-manager for efficient file reuse and metadata tracking.
  • Flexible Destinations: Download to disk, in-memory buffer, or cache.
  • Automatic Metadata: Tracks download status, timestamps, HTTP headers, file hashes, and more.
  • Configurable: Supports configuration via Python dict or config file.
  • Pre-commit, Linting, and CI: Ready for robust development workflows.

Installation

pip install git+https://github.com/saezlab/download-manager.git

If your are developing:

git clone https://github.com/saezlab/download-manager.git
cd download-manager
poetry install

Usage

import download_manager as dm

# Basic download to buffer
manager = dm.DownloadManager(backend='requests')
data = manager.download('https://www.google.com', dest=False)
print(data.read())

# Download to a file
manager = dm.DownloadManager(path='/tmp')
filepath = manager.download('https://www.google.com', dest='/tmp/google.html')
print(f"Downloaded to {filepath}")

# Download with cache integration
manager = dm.DownloadManager(path='/tmp')
filepath = manager.download('https://www.google.com')
print(f"Cached at {filepath}")

Architecture and Internals

The package is built around four core components:

  • DownloadManager: orchestrates cache lookup, backend selection, retries, and metadata updates.
  • Descriptor: normalizes request parameters (URL, query, headers, JSON, multipart, TLS CA path).
  • RequestsDownloader and CurlDownloader: backend-specific implementations of the download workflow.
  • cache_manager: optional persistence layer for file reuse and download metadata.

Component Diagram

flowchart LR
    U[User code] --> M[DownloadManager]
    M --> D[Descriptor]
    M --> C[(cache_manager Cache)]
    M --> B{backend}
    B --> R[RequestsDownloader]
    B --> P[CurlDownloader]
    D --> R
    D --> P
    R --> OUT[Path or BytesIO]
    P --> OUT
    M --> OUT

Runtime Flow

  1. Build or accept a Descriptor.
  2. Resolve backend from config (requests by default).
  3. Resolve destination policy:
    • dest='/path/file': force download to that path.
    • dest=None or dest=True: use cache path if cache is configured, otherwise memory buffer.
    • dest=False: force memory buffer.
  4. If cache is enabled, look up best matching item with URI + relevant download params.
  5. If no valid cached item exists, perform download and update cache metadata (status, timestamps, response headers, checksum, size, HTTP code).
  6. Return either path or io.BytesIO.
sequenceDiagram
    participant U as User
    participant M as DownloadManager
    participant C as Cache
    participant X as Backend Downloader

    U->>M: download(url, dest, kwargs)
    M->>M: Build Descriptor
    M->>C: best_or_new(...) if cache enabled
    alt cache hit
        M-->>U: return cached path
    else cache miss/uninitialized
        M->>X: instantiate(desc, path_or_none)
        M->>X: download()
        X-->>M: headers, status, bytes/file
        M->>C: update metadata
        M-->>U: return path or BytesIO
    end

Practical Usage Patterns

  • In-memory processing: use dest=False to get io.BytesIO.
  • Forced file output: pass explicit dest='/tmp/file.ext'.
  • Cache-first retrieval: initialize DownloadManager(path='/tmp/cache') and call download(url) without dest.
  • POST/JSON: pass query={...} with post=True or json=True.
  • Multipart uploads: pass multipart={...} with file paths included in the mapping.

API Overview

  • DownloadManager: Main interface for downloads and cache management.
  • Descriptor: Describes a download (URL, headers, POST/GET, etc).
  • CurlDownloader: PyCurl-based downloader.
  • RequestsDownloader: Requests-based downloader.

Configuration

You can configure the download manager via keyword arguments or a config file:

dm.DownloadManager(
    path='/my/cache/dir',
    backend='curl',  # or 'requests'
    # ...other options
)

Development

  • Linting: poetry run flake8 download_manager
  • Tests: poetry run pytest
  • Coverage: poetry run pytest --cov
  • Pre-commit: Install with pre-commit install

License

GNU General Public License v3.0


Acknowledgements

Developed by the OmniPath team at Heidelberg University Hospital.

Citation

If you use this software, please cite the repository and the OmniPath team.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dlmachine-0.0.1.tar.gz (145.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dlmachine-0.0.1-py3-none-any.whl (35.5 kB view details)

Uploaded Python 3

File details

Details for the file dlmachine-0.0.1.tar.gz.

File metadata

  • Download URL: dlmachine-0.0.1.tar.gz
  • Upload date:
  • Size: 145.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"NixOS","version":"26.05","id":"yarara","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dlmachine-0.0.1.tar.gz
Algorithm Hash digest
SHA256 676217973a076d897970a27d737d17243c2775ed964d8a1e4cbea1a490db96bb
MD5 1297fc5cc30722fa197ed9686c24ca94
BLAKE2b-256 768baace699ad15c4600001ee600c58d8d87796c9401493fa1770b32d7a99e2d

See more details on using hashes here.

File details

Details for the file dlmachine-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: dlmachine-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 35.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"NixOS","version":"26.05","id":"yarara","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dlmachine-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0004452984de50b8756a051f674f18bf56537709465a18c794169fbb8c7bc72a
MD5 6c7df3c5e0d4da50df2e45501547c1ec
BLAKE2b-256 f8274d4cace3630663b38c79cfaaf21c21c114fc02e21f698eee65b29b3bd99a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page