Skip to main content

Parallel download manager for HTTP/HTTPS/FTP/SFTP protocols.

Project description

Pymatris 📂

Pyversions

Parallel file downloader for HTTP/HTTPS, FTP and SFTP protocols, built using Python.

Installation

pip install pymatris

Usage

  • Initialize Downloader
from pymatris import Downloader

dl = Downloader()
  • Enqueue file to download
dl.enqueue_file("https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet", path="./")
  • Start downloading files
dl.download()
  • View results
result = dl.download()

Under the hood, pymatris.Downloader uses a global queue to manage the download tasks. pymatris.Downloader.enqueue_file() will add url to download queue, and pymatris.Downloder.download() will download the files in parallel. Pymatris uses asyncio to download files in parallel, with asychronous I/O operations using aiofiles, hence enabling faster downloads.

Results and Error Handling

pymatris.Downloader.download() returns a Results object, which is a list of the filenames that have been downloaded. Results object has two attributes, success and errors.

success is a list of named tuples, where each named tuple contains .path the filepath and .url the url.

errors is a list of named tuples, where each named tuple contains .filepath_partial the intended filepath, .url the url, .exception an Exception or aiohttp.ClientResponse that occurred during download.

Example Usage

from main.py

from pymatris import Downloader
urls = [
    "https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet",
    "ftp://bob:bob@192.168.1.6:20/tesfile.txt",
    "https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet",
]



dm = Downloader()

for url in urls:
    dm.enqueue_file(url, path="./")

results = dm.download()


print(results)
>> Success:
>> pricecatcher_2022-01.csv https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.csv
>> pricecatcher_2022-01.parquet https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet

>> Errors:
>> (ftp://bob:bob@192.168.1.6:20/tesfile.txt,
>> ConnectionRefusedError(61, "Connect call failed ('192.168.1.6', 20)"))

Advanced Usage

Visit main.py for advanced usage.

CLI

Pymatris also provides a command line interface to download files in parallel. In your terminal, run the following command to download files in parallel.

# Insert single url as argument
pymatris https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet 

# Or multiple urls 
pymatris https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet https://storage.data.gov.my/pricecatcher/pricecatcher_2022-02.parquet https://storage.data.gov.my/pricecatcher/pricecatcher_2022-03.parquet
  $ pymatris --help
  usage: pymatris [-h] [--max-parallel MAX_PARALLEL] [--max-splits MAX_SPLITS]
                [--overwrite] [--quiet] [--dir DIR] [--show-errors SHOW_ERRORS]
                [--timeouts TIMEOUTS] [--max-tries MAX_TRIES]
                URLS [URLS ...]

Arguments

To provide path to save files, use --dir option. By default files will be saved in current directory.

pymatris --dir "./" <urls>

To overwrite existing files, use --overwrite option. By default, files will not be overwritten.

pymatris --overwrite <urls>

Assuming your have "pricecatcher_2022-01.parquet" file in your current directory, running above command will overwrite the existing file. During download, Pymatris creates tempfile to download files, if download is interrupted, rest assured that your existing files are safe, and tempfiles will be deleted.

To configure number of parallel downloads, use --max-parallel option. By default, 5 parallel downloads are allowed.

pymatris --max-parallel 10 <urls>

Pymatris uses asyncio to download files in parallel. By default, 5 files are downloaded in parallel. You can increase or decrease the number of parallel downloads.

To configure number of parallel download parts per file, use --max-splits option. By default, 5 parts are downloaded in parallel for each file.

pymatris --max-splits 10 <urls>

This is only available for HTTP/HTTPS and SFTP protocols. Currently, FTP protocol does not support multipart downloads.

To configure number of retries for failed downloads, use --max-tries option. By default, 5 retries are allowed.

pymatris --max-tries 10 <urls>

To hide progress bar, use --quiet option. By default, progress bar is shown.

pymatris --quiet <urls>

Requirements

  • Python 3.9 or above
  • aiohttp
  • aioftp
  • asyncssh
  • aiofiles
  • tqdm

TODO

  • Add better concurrency support for FTP protocol.
  • Better error handling and logging for FTP and SFTP protocols.

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymatris-0.0.6.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

pymatris-0.0.6-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file pymatris-0.0.6.tar.gz.

File metadata

  • Download URL: pymatris-0.0.6.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.5.0-1018-azure

File hashes

Hashes for pymatris-0.0.6.tar.gz
Algorithm Hash digest
SHA256 9aec41adc27fcc4fd78f029105f623d6ef7ad24f6f4e7fb0d47018cc022bdd4a
MD5 a5f5e3d0100d0fa306853478a1468e9b
BLAKE2b-256 519659d8b54929644b2fd68d2ab65dcb36a65532610207dcf92c37b707b849ae

See more details on using hashes here.

File details

Details for the file pymatris-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: pymatris-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.5.0-1018-azure

File hashes

Hashes for pymatris-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 ff9e583b398dc687a9a8850ddf90ab40630699069dcea8e4d223e2f7fb394e88
MD5 bf5d3c16be8c2c2d53eeaf9cc03c4c2f
BLAKE2b-256 d61d774d00f275f4e362a580f10b035296dbc953c17c8aa3ae36b7e8071343bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page