Skip to main content

Parallel Fast Downloader

Project description

pfd - Parallel fast downloader

Download large number of files extremely fast.

A Python Package for Fast Parallel Download of Multiple Files. It's Simple, easy and extremely fast since it uses, all the cores in your CPU to spin separate process for parallel download. pfd uses requests as its only dependency, which is almost always present in all python environment.

PyPI - Python Version

Python Package Index Badges

PyPI PyPI - Downloads PyPI - Status PyPI - Format

Github Badges

GitHub last commit GitHub commit activity GitHub code size in bytes Lines of code


Python Package Index Install

pip install pfd

Usage:

Basic example:

$ pfd input_url_file.txt
  • It downloads the files using the urls in the file. Each url in a line.
  • The downloaded files is stored in current directory.
  • Uses number of process equal to number of CPU Cores in the machine.

Getting help, info, version and example:

$ pfd [-h | -i | -v | -eg]
  • These options will just print text and exits.
    • -h Prints the help message.
    • -i Prints information aboout the package.
    • -v Prints current version of the package.
    • -eg Prints few example of how to use the this cli.

Specify save directory:

$ pfd input_url_file.txt downloads_directory
  • The downloaded files is stored in downloads_directory directory.
  • You can provide absolute or relative paths for both url file and save directory.

Use N Processes:

$ pfd input_url_file.txt -p 17
  • Uses 17 processes to download files.
  • Default is equal to the number of cpu core count in the machine.

File names:

$ pfd input_url_file.txt [-u | -n | -a]
  • By default, it searches for the file name in the response. If found, it will use this name. Else it will use a uuid string as file name. [Yet to be implemented]
    • -u All downloaded files will be named as uuid strings [eg: 5b71113f-43be-40f5-b267-9b93919196aa.jpg]
    • -n All downloaded files will be named as sequential numbers [eg: 017.jpg]
    • -a All downloaded files will be named as sequential lowercase english alphabets [eg: exy.jpg]
  • If extension is needed, they have to be manually provided.

Specify Extension

$ pfd input_url_file.txt -e png
  • Uses the provided extension in file names.
  • . (dot) is optional.
  • If extension is needed, they must be provided when using [-u | -n | -a] .

Description

Do you want to download thousands of files at once but can't wait for sequential download?

Today's machines have multiple CPU cores. Most entry level machines have 4 Cores while higher end machines have around 8 Cores, Some desktop processor even have 16 - 32 Cores. But using just one Core for downloading files is not the best approach if you have hundreds or thousands of files to download.

The rapid shift towards cloud technologies provide massive processing power, GigaBit network and faster writes to disk. By properly making use of this processing power, bandwidth, memory and IO, we can make our life a bit easier.

pfd is one of such package. It is a cli tool used to download thousands of files in short time. It achieves it by spinning seperate process per CPU core and downloading parallely.

Imagine a Machine with 24 Core CPU and Gigabit Network. The process to download 1000 files, and it takes 1 second to fetch each file,

Sequential download

Python Interpreter does the following steps.

Make request ➜ open a file ➜ save the content to the file

After it is done, Python Interpreter repeats the steps 1000 times sequentially.

It takes 1000 files x 1 Second = 1000 Seconds or ~17 Minutes to download all files.

➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜

Parallel download

28 Python Interpreter is spun up in all of 28 cores (one per core by default).

Each Python Interpreter does the following steps.

Make request ➜ open a file ➜ save the content to the file

After it is done, Each Python Interpreter repeats the steps 42 times (1000 files / 24 process).

It takes (1000 files / 24 Processes) x 1 Second = 42 Seconds or ~1 Minutes to download all files.

➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜
➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜
➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜
[.. 19 more processes ..]
➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜
➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜

Note: This is just a logical comparison. The time is not a real world example, many factors affects the download speeed. There will be a slight overhead in setting up the process and collecting the work. So, it will not download all in under a minute. The Overhead should be negligable with higher input.


Made With Python forthebadge

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pfd-0.5.2.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

pfd-0.5.2-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file pfd-0.5.2.tar.gz.

File metadata

  • Download URL: pfd-0.5.2.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.10.4 Windows/10

File hashes

Hashes for pfd-0.5.2.tar.gz
Algorithm Hash digest
SHA256 221328c649feead9e62db26c835d2258850daa808cbdee95f9051513ddac48f0
MD5 1339f9855a297656c83e80622daf6ad8
BLAKE2b-256 b2f66eea7335d36d7a460583cac7addb657931e14e4a2c681bfaf31b0a74be69

See more details on using hashes here.

File details

Details for the file pfd-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: pfd-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.10.4 Windows/10

File hashes

Hashes for pfd-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 28a31f3a6b94f7f8c81a234f0cdccba4b663b918551a53a897b230899bc7f7b4
MD5 cb175cf91b89bb710f08fe0a7a6f6e17
BLAKE2b-256 c4369ceb009ca4920ee2b860c5a7287e295e48b28939b50837749bdbb06722b6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page