Parallel Fast Downloader
Project description
Paralleldownload
Download large number of files extremely fast.
A Python Package for Fast Parallel Download of Multiple Files.
It's Simple, easy and extremely fast since it uses, all the cores in your CPU to spin separate process for parallel download.
Paralleldownload uses requests
as its only dependency, which is almost always present in all python environment.
Python Package Index Badges
Github Badges
Python Package Index Install
pip install paralleldownload
Usage:
Basic example:
$ paralleldownload input_url_file.txt
- It downloads the files using the urls in the file. Each url in a line.
- The downloaded files is stored in current directory.
- Uses number of process equal to number of CPU Cores in the machine.
Getting help, info, version and example:
$ paralleldownload [-h | -i | -v | -eg]
- These options will just print text and exits.
-h
Prints the help message.-i
Prints information aboout the package.-v
Prints current version of the package.-eg
Prints few example of how to use the this cli.
Specify save directory:
$ paralleldownload input_url_file.txt downloads_directory
- The downloaded files is stored in
downloads_directory
directory. - You can provide absolute or relative paths for both url file and save directory.
Use N Processes:
$ paralleldownload input_url_file.txt -p 17
- Uses 17 processes to download files.
- Default is equal to the number of cpu core count in the machine.
File names:
$ paralleldownload input_url_file.txt [-u | -n | -a]
- By default, it searches for the file name in the response. If found, it will use this name. Else it will use a uuid string as file name. [Yet to be implemented]
-u
All downloaded files will be named as uuid strings[eg: 5b71113f-43be-40f5-b267-9b93919196aa.jpg]
-n
All downloaded files will be named as sequential numbers[eg: 017.jpg]
-a
All downloaded files will be named as sequential lowercase english alphabets[eg: exy.jpg]
- If extension is needed, they have to be manually provided.
Specify Extension
$ paralleldownload input_url_file.txt -e png
- Uses the provided extension in file names.
.
(dot) is optional.- If extension is needed, they must be provided when using
[-u | -n | -a]
.
Description
Do you want to download thousands of files at once but can't wait for sequential download?
Today's machines have multiple CPU cores. Most entry level machines have 4 Cores while higher end machines have around 8 Cores, Some desktop processor even have 16 - 32 Cores. But using just one Core for downloading files is not the best approach if you have hundreds or thousands of files to download.
The rapid shift towards cloud technologies provide massive processing power, GigaBit network and faster writes to disk. By properly making use of this processing power, bandwidth, memory and IO, we can make our life a bit easier.
paralleldownload is one of such package. It is a cli tool used to download thousands of files in short time. It achieves it by spinning seperate process per CPU core and downloading parallely.
Imagine a Machine with 24 Core CPU and Gigabit Network. The process to download 1000 files, and it takes 1 second to fetch each file,
Sequential download
Python Interpreter does the following steps.
Make request ➜ open a file ➜ save the content to the file
After it is done, Python Interpreter repeats the steps 1000 times sequentially.
It takes 1000 files
x
1 Second
= 1000 Seconds
or ~17 Minutes
to download all files.
➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜
Parallel download
28 Python Interpreter is spun up in all of 28 cores (one per core by default).
Each Python Interpreter does the following steps.
Make request ➜ open a file ➜ save the content to the file
After it is done, Each Python Interpreter repeats the steps 42 times (1000 files / 24 process).
➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜
➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜
[.. 21 more processes ..]
➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜ .. ➜➜➜
It takes (1000 files
/
24 Processes)
x
1 Second
= 42 Seconds
or ~1 Minutes
to download all files.
Note: This is just a logical comparison. The time is not a real world example, many factors affects the download speeed. There will be a slight overhead in setting up the process and collecting the work. So, it will not download all in under a minute. The Overhead should be negligable with higher input.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file paralleldownload-0.5.2.tar.gz
.
File metadata
- Download URL: paralleldownload-0.5.2.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.1 CPython/3.10.4 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56c5dcacb2ab95817bf804dcc483486faa0a5dea9e05ed328e11b212c9eb4025 |
|
MD5 | 6c4bc3420425ca7181337cc3c3058a48 |
|
BLAKE2b-256 | e36978039a4daf0f93a78154fdd8c517c8268b535ef242d1931cf107fbfeb309 |
File details
Details for the file paralleldownload-0.5.2-py3-none-any.whl
.
File metadata
- Download URL: paralleldownload-0.5.2-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.1 CPython/3.10.4 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee20d3403744f1dde512da03765c4c137823b6130540d33ac44bb1b64899a9fb |
|
MD5 | 686605ce437c9cd1e30cf13ee4d24501 |
|
BLAKE2b-256 | ceb86409c54c5cc1a0a331bd8475300584e72653d360e29dd42b8ad1ed6e5758 |