Skip to main content

The parallel downloader.

Project description

Scylla

Scyla downloads files in parallel.
It is aimed at speeding up concurrent downloads, not at being as fast as possible per download.
There are probably a bunch of tools out there that are faster, as that is not scylas goal.
The reason scylla exists is because I had to repeatedly download a bunch of files and it took a while with wget.

It can be used from the command line or as importable module.
From the commandline it is quite easy to use:

kazaamjt@workstation:~$ scylla -f url_list -o artifacts/

Alternativly you can import the module from in a script.

CLI Options

As mentioned, the CLI is quite simple, it does not have many options:

kazaamjt@workstation:~$ scylla --help
Usage: scylla [OPTIONS]

Options:
  -f, --file TEXT               Path to a file containing urls.
  -o, --output-dir TEXT         Directory to store the retrieved files.
  -s, --max-concurrent INTEGER  Maximum number of simultanious files to
                                download. Setting this to 0 will make scylla
                                try to download all the files at once.
  --help                        Show this message and exit.

Using as a module

Using as a module is also not very difficult.
Do note that the module uses async, so it does have some gotchas.
Also note that the module is fully typed, hopefully making its usage easier.

To use it simply import the module and instantiate the Downloader class:

import asyncio
from pathlib import Path

import scylla

urls = [
    "https://www.kernel.org/pub/linux/kernel/v5.x/linux-5.10.17.tar.xz",
    "https://example.com/",
]

async def main() -> None:
    downloader = Downloader(urls, Path("."))
    await downloader.start()

if __name__ == "__main__":
    asyncio.run(main())

The most important gotcha here is making sure the downloader class is instantiated in the same event-loop as Downloader.start is called from.
We do this here by wrapping them in a single async function.

And that's it really.

The class has a couple more init parameters that are more for advanced usage:

- max_concurrent: int = 5  
  Same as the CLI, this changes the maximum number of downloads that run simultaniously.  
- chunk_report_cb: Optional[ChunkReportCallback] = None  
  The function passed to this parameter will be called everytime a chunk was succesfully downloaded and saved.  
  Usefull for tracking download progress.  
- report_done_cb: Optional[Callable[[str], None]] = None  
  The function passed to this parameter will be called everytime a download is completed.  
  The parameter being passed back is the name of the file.  

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scylla-http-1.0.0.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scylla_http-1.0.0-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file scylla-http-1.0.0.tar.gz.

File metadata

  • Download URL: scylla-http-1.0.0.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for scylla-http-1.0.0.tar.gz
Algorithm Hash digest
SHA256 21bf3724e24c31673eb252d10d635b456774991466573dff5ad9f6cea5a2b205
MD5 d2c9eed4bd2d3bb09e54c0abf164c71e
BLAKE2b-256 f970a798a00ce346c0dff61cd4ac0785dc1c9ba64894e3a7cf0abc67caaf15c3

See more details on using hashes here.

File details

Details for the file scylla_http-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: scylla_http-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for scylla_http-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0932a15d840d2b58cc7c92b9af496772e2da718b9a6fb9c9cb715949ddd1fbf0
MD5 f0dabaaed928f351ed8d40b62cad9f3b
BLAKE2b-256 77af17383aab18cf2ccec2106d42f572391710997c41e160d0b6583af49a50ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page