Skip to main content

No project description provided

Project description

hget

PyPI version Page Views Count

hget是用于下载文件的命令行软件,支持http和ftp两种下载协议(http/https/ftp),优化亚马逊云对象存储数据下载(aws s3 cp),采用异步协程并发下载,节省线程开销,提高并发量,支持可中断的,随时恢复的下载方式。在网络不好的情况下,可实现下载速度比wget/axel/aws s3 cp快100~200倍以上。

1. 依赖

1.1 运行环境
  • linux64
  • python >=3.8
1.2 其他python模块依赖
  • Cython
  • requests
  • aiohttp
  • boto3
  • tqdm

2. 特点

  • python async异步协程并发,减少线程开销,支持更多并发量
  • 对单个下载文件进行分块并发下载,充分利用网络IO,同时支持断点续传
  • 有一个*.ht的中间文件,记录各分块下载进度和状态,无其他临时文件
  • 程序autoreload机制,网络异常或其他程序异常情况下,会自动重置环境并继续下载
  • 多种异常以及信号处理,确保下载状态能准确保存

3. 安装

git仓库安装 (for recommend)

pip3 install git+https://github.com/yodeng/hget.git

Pypi官方源安装

pip3 install hget -U

4. 使用

hget支持命令行运行和模块导入运行

4.1 command-line usage
$ hget -h 
usage: hget [-h] [-o <file>] [--dir <dir>] [-n <int>] [-c <int>] [-t <int>] [-s <str>] [-d] [-q] [-v] [--access-key <str>] [--secrets-key <str>] [--noreload] <url>

An interruptable and resumable download accelerator.

positional arguments:
  <url>                 download url, http/https/s3/ftp support

optional arguments:
  -h, --help            show this help message and exit
  -o <file>, --output <file>
                        output download file
  --dir <dir>           output download directory
  -n <int>, --num <int>
                        the max number of async concurrency (not thread or process), default: auto
  -c <int>, --connections <int>
                        the max number of tcp connections for http/https. more tcp connections can speedup, but might be forbidden by url server, default: auto
  -t <int>, --timeout <int>
                        timeout for download, 30s by default
  -s <str>, --max-speed <str>
                        specify maximum speed per second, case-insensitive unit support (K[b], M[b]...), no-limited by default
  -d, --debug           logging debug
  -q, --quiet           suppress all output except error or download success
  -v, --version         show program's version number and exit
  --noreload            tells hget to NOT use the auto-reloader

aws arguments:
  --access-key <str>    access key if necessary
  --secrets-key <str>   secrets key if necessary

命令和参数解释如下:

参数 描述
<url> 位置参数,需要下载的网址,以http/https/ftp/s3开头
-o/--output 下载保存的文件名,默认当前目录下的下载文件
-n/--num 最大的下载并发量,默认根据下载文件大小自动配置。
-c/--connections http协议下载时,最大的tcp连接数,默认根据下载文件大小自动配置,值越大可加速下载,但有可能连接过多被服务端拒绝连接
-t/--timeout 下载连接的最长超时,默认30秒
-s/--max-speed 每秒最大数据下载量(bytes),默认无限制,支持不区分大小写的单位K[B]M[B]等等
-d/--debug debug模式,更多的logging输出
-q/--quiet 禁止除错误外的全部屏幕输出
-v/--version 打印软件版本并退出
--access-key 亚马逊云对象存储访问key,s3地址生效,没有可以不提供
--secrets-key 亚马逊云对象存储私有key,s3地址生效,没有可以不提供
--noreload 禁止自动重载,当网络异常或程序异常中断情况下,不进行重置并继续下载
  • -c/--connections: 最大tcp连接数,自动选择即可,如果要配置,建议不要超过500
  • -n/--num: 最大并发量,自动选择即可,如果要配置,建议不要超过1000,否则可能会超出系统ulimit限制而被杀掉
4.2 python module import usage

Simple usage

from hget import hget

url="https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz"
outfile="./hg19.fa.gz"

hget(url=url, outfile=outfile, quiet=False)

Parallel usage

from joblib import Parallel, delayed
from hget import hget

urls = ["https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz",
        "https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz"]

Parallel(n_jobs=2)(delayed(hget)(url) for url in urls)
  • import调用方式不支持auto-reload

5. 测试

ucschg19基因组fasta文件下载为例:

hget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz (单线程异步, 下载速度 5Mb/s)

hget

wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz (下载速度<20kb/s)

wget

axel -n 40 -a https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz (40个线程,下载速度约200kb/s左右)

axel

6. 说明

  • hget异步下载,对http下载做了并发优化处理,ftp下载只支持断点续传
  • 由于并发较大,可能会遇到部分网站服务端拒绝连接的情况,通常几分钟后即可恢复,可通过减少TCP连接和并发量参数控制,也可以通过设置最大下载速度控制
  • hget只是提供普通下载,请勿用于爬虫或恶意网络连接,产生一切责任由使用者承担

7. License

MIT license

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

hget-1.0.6-py2.py3-none-any.whl (14.1 kB view details)

Uploaded Python 2 Python 3

hget-1.0.6-3-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

hget-1.0.6-2-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

hget-1.0.6-1-py2.py3-none-any.whl (14.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file hget-1.0.6-py2.py3-none-any.whl.

File metadata

  • Download URL: hget-1.0.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/44.1.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/2.7.18

File hashes

Hashes for hget-1.0.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 aff29351f5d6d2ed5e63b6a4f7ad2423833519a83a0d3f5fbc71d26d33b5ba55
MD5 bf8d391a8eb5fe43fa0304b153d47576
BLAKE2b-256 c7668c7bede0f8dc473b7abb6525ca839f4dd53f50eadc68335cf5b81d5e2b25

See more details on using hashes here.

File details

Details for the file hget-1.0.6-3-py3-none-any.whl.

File metadata

  • Download URL: hget-1.0.6-3-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.5

File hashes

Hashes for hget-1.0.6-3-py3-none-any.whl
Algorithm Hash digest
SHA256 fff61171b94927b16d55c1ce529bfa7c1d38025eef7d61f027834a1b3373b9b0
MD5 6fa3d889f5da6a1f11a6b879f3f638ae
BLAKE2b-256 6161cb351cf7643b3a743cddb6556c4c9c81f7698f8e6e4cf56b3eb5f62e751d

See more details on using hashes here.

File details

Details for the file hget-1.0.6-2-py3-none-any.whl.

File metadata

  • Download URL: hget-1.0.6-2-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.5

File hashes

Hashes for hget-1.0.6-2-py3-none-any.whl
Algorithm Hash digest
SHA256 1e17243f6aa914c3a36b467078e90e3626db0dbe544ebaafcfdbc75b45cd5c54
MD5 647c067f9c6ffd40e5264f14f8ed6b67
BLAKE2b-256 28514074aa98eee6d4bb9d1db7b86dfba680bc1faaf625bbc8c4feffb0e94050

See more details on using hashes here.

File details

Details for the file hget-1.0.6-1-py2.py3-none-any.whl.

File metadata

  • Download URL: hget-1.0.6-1-py2.py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/44.1.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/2.7.18

File hashes

Hashes for hget-1.0.6-1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 face4f73ddab32a626d91e53c99477036a2e3ba7ce0a610447d5fa9057565a5e
MD5 9ccd45ba42dc529b8a454e056dc86726
BLAKE2b-256 bb2c8a8a60b1deaa1942e6525babadddcad05ef1f337452dc20b29af32190fb8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page