Skip to main content

download image from Baidu Image

Project description

BaiduImagesDownload

Python package codecov Codacy Badge

BaiduImagesDownload是一个快速、简单百度图片爬取工具

from BaiduImagesDownload import Crawler

net, num, urls = Crawler.get_images_url('二次元', 20)
Crawler.download_images(urls)

目录

安装

pip install BaiduImagesDownload

使用

基本

from BaiduImagesDownload import Crawler

# original为True代表优先下载原图
net, num, urls = Crawler.get_images_url('二次元', 20, original=True)
Crawler.download_images(urls)

下载设置

from BaiduImagesDownload import Crawler

# rule设置允许的图片格式,默认为('.png', '.jpg')
# timeout为超时时间,默认为60(s)
net, num, urls = Crawler.get_images_url('二次元', 20)
Crawler.download_images(urls, rule=('.png', '.jpg'), timeout=60)

文档

get_images_url

class Crawler:

    @staticmethod
    def get_images_url(word: str, num: int, original: bool = True,
                       timeout: int = __CONCURRENT_TIMEOUT) -> (bool, bool, list):

参数

  • word: str: 搜索关键词
  • num: int: 搜索数量
  • original: bool, optional:是否下原图,默认为True
  • timeout: int, optional: 请求 timeout, 默认为60(s)

返回

  • net: bool: 网络连接是否成功,成功为 True,失败为 False
  • num: bool: 图片数量是否满足,满足为 True,不足为 False
  • urls: list: 获取的 urls,每项为一个dict,其中有两个键obj_urlfrom_urlobj_url为对应图片的urlfrom_urlReferer

download_images

class Crawler:

    @staticmethod
    def download_images(urls: list, rule: tuple = ('.png', '.jpg'),
                        path: str = 'download', timeout: int = __CONCURRENT_TIMEOUT,
                        concurrent: int = __CONCURRENT_NUM, command: bool = True) -> (int, int):

参数

  • urls: list: 需要爬的图片列表,格式与get_images_url返回的相同
  • rule: tuple, optional: 允许下载的格式,默认为('.png', '.jpg')
  • path: str, optional: 图片下载的路径,默认为'download'
  • timeout: int, optional: 请求 timeout, 默认为60(s)
  • concurrent: int, optional: 并行下载的数量,默认为100
  • command: bool, optional: 是否在控制台显示进度条,默认为True

返回

  • success: int: 下载成功的数量
  • failed: int: 下载失败的数量

日志

可以设置日志的等级以及输出,具体请查看logging

import logging
from BaiduImagesDownload import logger

# 设置日志的等级为DEBUG
# 默认为INFO
logger.setLevel(logging.DEBUG)

# 设置输出到文件
file_handler = logging.FileHandler('~/BaiduImagesDownload.log')
file_handler.setFormatter(logging.Formatter(
    '[%(asctime)s] [%(levelname)s] %(message)s')) # 设置输出格式
logger.addHandler(file_handler)

许可

License: MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

BaiduImagesDownload-2.0.0.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

BaiduImagesDownload-2.0.0-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file BaiduImagesDownload-2.0.0.tar.gz.

File metadata

  • Download URL: BaiduImagesDownload-2.0.0.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.6

File hashes

Hashes for BaiduImagesDownload-2.0.0.tar.gz
Algorithm Hash digest
SHA256 d88d10030084c2c8fd4a8474d3c2da88b850fb911ad05ac2811cbded9204f75d
MD5 9b67a21a90a930c303597f9c2aca4a26
BLAKE2b-256 d289356b6d76558ceeac7add69a0669ee8cd4bdc795a7fd0965b3dd9983961ca

See more details on using hashes here.

File details

Details for the file BaiduImagesDownload-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: BaiduImagesDownload-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.6

File hashes

Hashes for BaiduImagesDownload-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 21716ae126ccfb2015fc534102e0f4e4f8f459830bfdd068e338011618cc8533
MD5 25c83b8bfe1758a29719ccac0c47d218
BLAKE2b-256 101067fd1b741dc77093ab36ffb65a7ade99512d8715be0c2806ade90ff591d4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page