A multi-threaded and multi-source aria2-like batch file downloading library for Python
Project description
bdownload
A multi-threaded and multi-source aria2-like batch file downloading library for Python
Installation
-
via PyPI
pip install bdownload
-
from within source directory locally
pip install .
Note that you should
git clone
or download the source tarball (and unpack it of course) from the repository first
Usage: as a Python package
Importing
from bdownload import BDownloader
or
import bdownload
Signatures
class bdownload.BDownloader(max_workers=None, min_split_size=1024*1024, chunk_size=1024*100, proxy=None, cookies=None, user_agent=None, logger=None, progress='mill', num_pools=20, pool_maxsize=20)
Create and initialize a BDownloader
object for executing download jobs.
-
The
max_workers
parameter specifies the number of the parallel downloading threads, whose default value is determined by #num_of_processor * 5 if set toNone
. -
min_split_size
denotes the size in bytes of file pieces split to be downloaded in parallel, which defaults to 1024*1024 bytes (i.e. 1MB). -
The
chunk_size
parameter specifies the chunk size in bytes of every http range request, which will take a default value of 1024*100 (i.e. 100KB) if not provided. -
proxy
supports both HTTP and SOCKS proxies in the form of http://[user:pass@]host:port and socks5://[user:pass@]host:port, respectively. -
If
cookies
needs to be set, it must take the form of cookie_key=cookie_value, with multiple pairs separated by space character if applicable, e.g. 'key1=val1 key2=val2'. -
When
user_agent
is not given, it will default to 'bdownload/VERSION', with VERSION being replaced by the package's version number. -
The
logger
parameter specifies an event logger. Iflogger
is notNone
, it must be an object of classlogging.Logger
or of its customized subclass. Otherwise, it will use a default module-level logger returned bylogging.getLogger(__name__)
. -
progress
determines the style of the progress bar displayed while downloading files. Possible values are'mill'
and'bar'
, and'mill'
is the default. -
The
num_pools
parameter has the same meaning asnum_pools
inurllib3.PoolManager
and will eventually be passed to it. Specifically,num_pools
specifies the number of connection pools to cache. -
pool_maxsize
will be passed to the underlyingrequests.adapters.HTTPAdapter
. It specifies the maximum number of connections to save that can be reused in the urllib3 connection pool.
BDownloader.downloads(path_urls)
Submit multiple downloading jobs at a time.
path_urls
accepts a list of tuples of the form (path, url), where path should be a pathname, probably prefixed with absolute or relative paths, and url should be a URL string, which may consist of multiple TAB-separated URLs pointing to the same file. A validpath_urls
, for example, could be [('/opt/files/bar.tar.bz2', 'https://foo.cc/bar.tar.bz2'), ('./sanguoshuowen.pdf', 'https://bar.cc/sanguoshuowen.pdf\thttps://foo.cc/sanguoshuowen.pdf'), ('/to/be/created/', 'https://flash.jiefang.rmy/lc-cl/gaozhuang/chelsia/rockspeaker.tar.gz'), ('/path/to/existing-dir', 'https://ghosthat.bar/foo/puretonecone81.xz\thttps://tpot.horn/foo/puretonecone81.xz\thttps://hawkhill.bar/foo/puretonecone81.xz')].
BDownloader.download(path, url)
Submit a single downloading job.
- Similar to
BDownloader.downloads()
, in fact it is just a special case of which, with [(path
,url
)] composed of the specified parameters as the input.
BDownloader.wait_for_all()
Wait for all the downloading jobs to complete. Returns a 2-tuple of lists (succeeded, failed). The first list succeeded contains the originally passed (path, url)s that completed successfully, while the second list failed contains the raised and cancelled ones.
BDownloader.close()
Shut down and perform the cleanup.
Examples
- Single file downloading
import unittest
import tempfile
import os
import hashlib
from bdownload import BDownloader
class TestBDownloader(unittest.TestCase):
def setUp(self):
self.tmp_dir = tempfile.TemporaryDirectory()
def tearDown(self):
self.tmp_dir.cleanup()
def test_bdownloader_download(self):
file_path = os.path.join(self.tmp_dir.name, "aria2-x86_64-win.zip")
file_url = "https://github.com/Jesseatgao/aria2-patched-static-build/releases/download/1.35.0-win-linux/aria2-x86_64-win.zip"
file_sha1_exp = "16835c5329450de7a172412b09464d36c549b493"
with BDownloader(max_workers=20, progress='mill') as downloader:
downloader.download(file_path, file_url)
downloader.wait_for_all()
hashf = hashlib.sha1()
with open(file_path, mode='rb') as f:
hashf.update(f.read())
file_sha1 = hashf.hexdigest()
self.assertEqual(file_sha1_exp, file_sha1)
if __name__ == '__main__':
unittest.main()
- Batch file downloading
import unittest
import tempfile
import os
import hashlib
from bdownload import BDownloader
class TestBDownloader(unittest.TestCase):
def setUp(self):
self.tmp_dir = tempfile.TemporaryDirectory()
def tearDown(self):
self.tmp_dir.cleanup()
def test_bdownloader_downloads(self):
files = [
{
"file": os.path.join(self.tmp_dir.name, "aria2-x86_64-linux.tar.xz"),
"url": "https://github.com/Jesseatgao/aria2-patched-static-build/releases/download/1.35.0-win-linux/aria2-x86_64-linux.tar.xz",
"sha1": "d02dfdab7517e78a257f4403e502f1acc2a795e4"
},
{
"file": os.path.join(self.tmp_dir.name, "mkvtoolnix-x86_64-linux.tar.xz"),
"url": "https://github.com/Jesseatgao/MKVToolNix-static-builds/releases/download/v47.0.0-mingw-w64-win32v1.0/mkvtoolnix-x86_64-linux.tar.xz",
"sha1": "19b0c7fc20839693cc0929f092f74820783a9750"
}
]
file_urls = [(f["file"], f["url"]) for f in files]
with BDownloader(max_workers=20, progress='mill') as downloader:
downloader.downloads(file_urls)
downloader.wait_for_all()
for f in files:
hashf = hashlib.sha1()
with open(f["file"], mode='rb') as fd:
hashf.update(fd.read())
file_sha1 = hashf.hexdigest()
self.assertEqual(f["sha1"], file_sha1)
if __name__ == '__main__':
unittest.main()
Usage: as a command-line script
Synopsis
bdownload [-h] [-o OUTPUT [OUTPUT ...]] -L URLS [URLS ...] [-D DIR]
[-p PROXY] [-n MAX_WORKERS] [-k MIN_SPLIT_SIZE]
[-s CHUNK_SIZE] [-e COOKIE] [--user-agent USER_AGENT]
[-P {mill,bar}] [--num-pools NUM_POOLS]
[--pool-size POOL_SIZE]
[-l {debug,info,warning,error,critical}]
Description
-h, --help
show help message and exit
-o OUTPUT [OUTPUT ...], --output OUTPUT [OUTPUT ...]
one or more file names (optionally prefixed with relative (to -D DIR
) or absolute paths), e.g. -o file1.zip ~/file2.tgz
, paired with URLs specified by --url
or -L
-L URLS [URLS ...], --url URLS [URLS ...]
URL(s) for the files to be downloaded, which might be TAB-separated URLs pointing to the same file, e.g. -L https://yoursite.net/yourfile.7z
, -L "https://yoursite01.net/thefile.7z\thttps://yoursite02.com/thefile.7z"
, or --url "http://foo.cc/file1.zip" "http://bar.cc/file2.tgz\thttp://bar2.cc/file2.tgz"
-D DIR, --dir DIR
directory in which to save the downloaded files
-p PROXY, --proxy PROXY
proxy in the form of "http://[user:pass@]host:port" or "socks5://[user:pass@]host:port"
-n MAX_WORKERS, --max-workers MAX_WORKERS
number of worker threads [default: 20]
-k MIN_SPLIT_SIZE, --min-split-size MIN_SPLIT_SIZE
file split size in bytes, "1048576, 1024K or 2M" for example [default: 1M]
-s CHUNK_SIZE, --chunk-size CHUNK_SIZE
every request range size in bytes, "10240, 10K or 1M" for example [default: 100K]
-e COOKIE, --cookie COOKIE
cookies in the form of "cookie_key=cookie_value cookie_key2=cookie_value2"
--user-agent USER_AGENT
custom user agent
-P {mill,bar}, --progress {mill,bar}
progress indicator [default: mill]
--num-pools NUM_POOLS
number of connection pools [default: 20]
--pool-size POOL_SIZE
max number of connections in the pool [default: 20]
-l {debug,info,warning,error,critical}, --log-level {debug,info,warning,error,critical}
logger level [default: warning]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.