Skip to main content

Flexible download manager

Project description

Table of Contents

  1. Introduction
  2. Installation
  3. Quick Start
  4. Core Data Types & Enums
  5. Download Manager API (DownloadManager)
  6. Advanced Features
  7. Utility Functions
  8. License

1. Introduction

pydown is a flexible Python library for managing file downloads. It provides a unified, high-level API to handle downloads over multiple protocols, including HTTP(S), FTP, and SFTP, with robust support for advanced features like concurrency, download resuming, and speed limiting.

Features

  • Multi-Protocol Support: Natively handles HTTP, HTTPS, FTP, and SFTP URLs.
  • Concurrent Downloads: Download multiple files simultaneously using an efficient asynchronous worker pool.
  • Pause & Resume: Pause downloads and resume them later, even after the application restarts.
  • Error Handling & Retries: Automatically retries failed downloads with configurable exponential backoff.
  • Speed Limiting: Throttle download bandwidth to a specified maximum rate.
  • Real-time Monitoring: Use observers to get live feedback on download progress, speed, status changes, and errors.
  • Duplicate Handling: Configure strategies (skip, overwrite, rename) for handling duplicate download requests.

2. Installation

Dependencies

pydown depends on the following libraries, which will be installed automatically: httpx, validators, humanize, dataclasses-json, and paramiko.

Installation

Install via PyPI:

pip install py-downx

3. Quick Start

This example demonstrates how to download a file using the DownloadManager.

import time
from pydown import DownloadManager, create_download_request

# 1. Initialize the Download Manager
# This will manage a queue of downloads with up to 3 concurrent workers.
manager = DownloadManager(max_concurrent_downloads=3)

# 2. Create a download request for a test file
# The file will be saved as '100MB.bin' in the current directory.
request = create_download_request(
    name="Large Test File",
    url="http://speedtest.tele2.net/100MB.zip",
    file_path="100MB.zip"
)

# 3. Add the request to the manager's queue
manager.add_download(request)
print("Download added to the queue.")

# 4. Start the download workers
manager.start()
print("Download manager started.")

# 5. Wait for all downloads to complete
manager.wait_for_completion()
print("All downloads have finished.")

# 6. Stop the manager and clean up resources
manager.stop()
print("Manager stopped.")

3.1. Command Line Interface

PyDown includes a powerful command-line tool that provides all the library's functionality through an easy-to-use CLI interface.

Installation with CLI Support

After installing pydown, the pydown command will be available in your terminal:

pip install py-downx
pydown --help

Basic Usage

# Download a single file
pydown https://example.com/file.zip

# Download with custom output path
pydown https://example.com/file.zip -o /path/to/save/file.zip

# Download multiple files concurrently
pydown url1 url2 url3 -c 5 -d ./downloads/

# Batch download from a file containing URLs
pydown --batch urls.txt -d ./downloads/

Advanced CLI Features

Concurrent Downloads and Performance

# Set maximum concurrent downloads and segments
pydown https://example.com/largefile.zip -c 3 -s 8 --speed-limit 1000000

Authentication and Headers

# HTTP headers and cookies (as JSON)
pydown https://api.example.com/data.json --headers '{"Authorization": "Bearer token"}'

# FTP/SFTP with credentials
pydown ftp://user:pass@server/file.txt
pydown sftp://user:pass@server/file.txt

Session Management

# Save download session for later resuming
pydown https://example.com/file.zip --save-session mysession.json

# Resume previous session
pydown --resume mysession.json

Batch Operations

Create a text file with URLs (one per line):

# my_downloads.txt
https://example.com/file1.zip
https://example.com/file2.pdf
https://cdn.example.com/data.json

Then download all files:

pydown --batch my_downloads.txt -d ./downloads/ -c 5

Output Control

# Quiet mode (no progress bars)
pydown https://example.com/file.zip -q

# Verbose output with detailed logging
pydown https://example.com/file.zip -v

# Log to file
pydown https://example.com/file.zip --log-file downloads.log

Error Handling and Retries

# Configure retry behavior and timeouts
pydown https://example.com/file.zip --retries 5 --timeout 60

# Handle duplicate files (skip, overwrite, or rename)
pydown https://example.com/file.zip --duplicate rename

CLI Options Reference

Option Description Default
urls URLs to download (positional arguments) -
-o, --output Output file path (for single downloads) Auto-generated
-d, --directory Output directory Current directory
-c, --concurrent Maximum concurrent downloads 3
-s, --segments Maximum segments per download 8
--speed-limit Speed limit in bytes per second Unlimited
--timeout Connection timeout in seconds 30
--retries Maximum retry attempts 3
--duplicate Duplicate handling (skip, overwrite, rename) skip
--headers HTTP headers as JSON string None
--cookies HTTP cookies as JSON string None
--proxy Proxy URL None
--batch File containing URLs to download None
--save-session Save session to JSON file None
--resume Resume from saved session file None
-q, --quiet Suppress progress output False
-v, --verbose Verbose output False
--no-progress Disable progress bars False
--log-file Log to file None

Examples

  1. Simple Download:

    pydown https://example.com/file.zip
    
  2. Multiple Files with Custom Settings:

    pydown https://site1.com/file1.zip https://site2.com/file2.pdf \
           -d ~/Downloads/ -c 4 -s 6 --verbose
    
  3. Authenticated Download:

    pydown https://api.example.com/data.json \
           --headers '{"Authorization": "Bearer your-token"}' \
           --cookies '{"session": "abc123"}'
    
  4. Batch Download with Session Save:

    pydown --batch large_downloads.txt \
           --save-session backup.json \
           -d ./downloads/ -c 5 --verbose
    
  5. Resume Interrupted Downloads:

    pydown --resume backup.json
    

4. Core Data Types & Enums

DownloadRequest

A dataclass that holds all configuration and state for a single download. It is the central object you create and pass to the DownloadManager.

  • name: str: A human-readable name for the download.
  • url: str: The URL of the file to download.
  • file_path: str: The local path where the file will be saved.
  • status: DownloadStatus: The current status of the download (e.g., PENDING, COMPLETED).
  • priority: int: A numerical priority (higher numbers are processed first).
  • headers: Dict[str, str]: Custom HTTP headers.
  • max_retries: int: Maximum number of times to retry on failure.
  • speed_limit: Optional[int]: Speed limit in bytes per second.
  • checksum: Optional[str]: The expected checksum string for validation.
  • checksum_type: str: The algorithm to use (md5, sha1, sha256).
  • ftp_username: Optional[str]: Username for FTP/SFTP authentication.
  • ftp_password: Optional[str]: Password for FTP/SFTP authentication.

ProgressInfo

A dataclass passed to observers during progress updates.

  • total_size: int: Total size of the file in bytes.
  • downloaded_size: int: Number of bytes downloaded so far.
  • speed: float: Current download speed in bytes per second.
  • eta: float: Estimated time remaining in seconds.
  • progress_percent: float: Download progress as a percentage (0-100).

DownloadStatus

An Enum representing the state of a DownloadRequest.

  • PENDING: The download is waiting to be processed.
  • QUEUED: The download is in the queue, ready for a worker.
  • IN_PROGRESS: The download is actively being processed by a worker.
  • PAUSED: The download has been manually paused.
  • COMPLETED: The download finished successfully.
  • FAILED: The download failed after all retries.
  • CANCELLED: The download was cancelled by the user.
  • DUPLICATE: The download was skipped because it was identified as a duplicate.

5. Download Manager API (DownloadManager)

The DownloadManager is the main entry point for orchestrating all download operations.

Initialization & Lifecycle

  • __init__(self, max_concurrent_downloads: int = 3, duplicate_strategy: str = "skip", log_file: Optional[str] = None, quiet: bool = False)
    • Initializes the manager.
    • max_concurrent_downloads: The number of downloads to run in parallel.
    • duplicate_strategy: How to handle duplicates: "skip", "overwrite", "rename".
    • log_file: Path to a file for logging output.
    • quiet: If True, suppresses console logging.

Adding & Managing Downloads

  • add_download(self, request: DownloadRequest) -> str

    • Adds a single DownloadRequest to the queue. Returns the request URL as its unique ID.
  • add_downloads_from_json(self, json_file: str) -> List[str]

    • Loads and adds multiple download requests from a JSON file.
  • pause_download(self, url: str) -> bool

    • Pauses an active or pending download identified by its URL.
  • resume_download(self, url: str) -> bool

    • Resumes a paused download.
  • cancel_download(self, url: str) -> bool

    • Cancels a download. The partial file is not deleted.
  • export_downloads(self, json_file: str)

    • Saves the state of all current downloads to a JSON file.

Controlling the Manager

  • start(self)

    • Starts the worker threads to process the download queue.
  • stop(self)

    • Stops the workers and cleans up resources. This should be called to ensure a graceful exit.
  • wait_for_completion(self)

    • Blocks until the download queue is empty and all active downloads are finished.

Monitoring & Observers

  • add_observer(self, observer: DownloadObserver)

    • Registers a custom observer to receive real-time events.
  • remove_observer(self, observer: DownloadObserver)

    • Unregisters an observer.
  • get_download_status(self, url: str) -> Optional[DownloadRequest]

    • Retrieves the current state of a specific download.
  • get_all_downloads(self) -> Dict[str, DownloadRequest]

    • Returns a dictionary of all downloads managed by the instance.

6. Advanced Features

Monitoring with Observers

Create a custom class that inherits from DownloadObserver to react to download events.

from pydown import DownloadObserver, DownloadRequest, ProgressInfo, DownloadStatus

class MyCustomObserver(DownloadObserver):
    def on_progress(self, request: DownloadRequest, progress: ProgressInfo):
        print(f"[{request.name}] {progress.progress_percent:.1f}% at {progress.speed / 1024:.1f} KB/s")

    def on_status_change(self, request: DownloadRequest, old_status: DownloadStatus, new_status: DownloadStatus):
        print(f"[{request.name}] Status changed: {new_status.name}")

    def on_error(self, request: DownloadRequest, error: Exception):
        print(f"[{request.name}] An error occurred: {error}")

# Add it to the manager
manager = DownloadManager()
my_observer = MyCustomObserver()
manager.add_observer(my_observer)

Protocol-Specific Configuration

You can specify protocol-specific details, like FTP credentials, directly on the DownloadRequest object.

from pydown import create_download_request

ftp_request = create_download_request(
    name="FTP File",
    url="ftp://speedtest.tele2.net/1MB.zip",
    file_path="1MB.zip",
    ftp_username="anonymous",
    ftp_password="user@example.com"
)

manager.add_download(ftp_request)

Resume, Retry, and Duplicate Handling

  • Resume: Resuming is enabled by default. pydown creates a .partial file and will automatically pick up where it left off if the download is interrupted.
  • Retry: The manager automatically retries downloads on connection errors or server-side issues (HTTP 5xx). Configure this with max_retries on the DownloadRequest.
  • Duplicates: The duplicate_strategy on the DownloadManager controls behavior when a download is added that is identical to a previously completed one (based on URL, size, and checksum).

7. Utility Functions

pydown provides helper functions to simplify common tasks.

  • create_download_request(name: str, url: str, **kwargs) -> DownloadRequest

    • A convenient factory to create a DownloadRequest object.
  • cookies_from_requests_session(session: 'requests.Session') -> Dict[str, str]

    • Extracts cookies from a requests.Session object to use in a DownloadRequest.
  • headers_from_requests_session(session: 'requests.Session') -> Dict[str, str]

    • Extracts headers from a requests.Session object.

Example:

import requests
from pydown import create_download_request, cookies_from_requests_session

# Log in to a site using the requests library
session = requests.Session()
session.post("https://example.com/login", data={"user": "...", "pass": "..."})

# Create a download request using the session's cookies
request = create_download_request(
    name="Authenticated Download",
    url="https://example.com/file.zip",
    cookies=cookies_from_requests_session(session)
)

8. License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_downx-1.1.0.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_downx-1.1.0-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file py_downx-1.1.0.tar.gz.

File metadata

  • Download URL: py_downx-1.1.0.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for py_downx-1.1.0.tar.gz
Algorithm Hash digest
SHA256 8faef6c915187e6b830dfa345e7bfe3a53d456eabf6e4bcbf455d08dab253b76
MD5 10aeb0ad36d53bd1d4294140d023db8b
BLAKE2b-256 06c0e30b56eae1013b7dd2524116cc2850320750d87e420c706db96e9b7fa9d7

See more details on using hashes here.

File details

Details for the file py_downx-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: py_downx-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for py_downx-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6ba029797e3d87277869547b7cfebfd25ceabcee3cf2baa8be2ec3de411f2340
MD5 b1cbc3f031aeda8517a344b791c6cd59
BLAKE2b-256 9f9f56e571225e12794ef66168652e0829eeba3a7a0a68579739a604dc9cb1d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page