Skip to main content

Flardl

Project description

Flardl - Adaptive Multi-Site Downloading of Lists

PyPI Python Version Docs Tests Codecov Repo Downloads Dlrate Codacy Snyk Health

Who would flardls bear?

logo

Features

Flardl adaptively downloads a list of files from a list of federated web servers. Federated, in this case, means that one can download the same file from each server in the list. For lists of a few hundred or more files of ~1MB in size, the download speed can approach Gbit/s line limits, typically 300X higher than a curl-based script.

The main speed-up is obtained by asynchronous I/O; the use of multiple servers provides stability and dynamic adaptability in the face of unknown server loads and net weather.

The name flardl could be either an acronym involving downloading, or a nonsense word. You pick.

Theory

Much has been written under the rubric of queueing theory, which we are purposefully discarding here. We take a semi-empirical approach based around chemical rate theory; this is in many was inadequate, but a full model of the downloading process requires prior knowledge (such as file sizes) we don't usually possess.

The simplest version of downloading simply launches every request as quickly as possible at a single server and lets the server handle the queueing. The server handles overlapped responses, which quickly drives the bandwidth to the maximum set by intervening network policy and hardware (such as ISP throttling), where it stays until all requests are completed. However, several effects make this a bit too simple:

  • Most servers will apply a policy that dumps requests once the queue depth gets too high. These requests must be re-queued, and if that is done stupidly then the available bandwidth gets eaten with unfulfilled requests. The queue depth policy is not known in advance, and it may depend on total queue depth that includes other users.

  • A server may decide you are executing a Denial-Of-Service (DOS) attack and respond by severely throttling further requests from your IP address. This throttling can last for hours or days, or even permanent black-listing. This "death penalty" can sometimes be triggered by activity of other users at the same institution that hides behind the same public IP address. I have seen practical classes brought to a complete halt by

Requirements

Flardl is tested under python 3.11, on Linux, MacOS, and Windows and under 3.9 and 3.10 on Linux. Under the hood, flardl relies on [https://www.python-httpx.org/] [httpx] and is supported on whatever platforms that library works under, for both HTTP/1.1 and HTTP/2. HTTP/3 support could easily be added via [https://github.com/aiortc/aioquic][aioquic] once enough servers are running HTTP/3 to make that worthwhile.

Installation

You can install Flardl via pip from PyPI:

$ pip install flardl

Usage

Please see the [Command-line Reference] for details.

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the BSD 3-clause_license, Flardl is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

Flardl was written by Joel Berendzen.

This project was generated from @cjolowicz's Hypermodern Python Cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flardl-0.0.4.tar.gz (17.5 kB view hashes)

Uploaded Source

Built Distribution

flardl-0.0.4-py3-none-any.whl (17.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page