Skip to main content

A simple scrapy library for Python.

Project description

shiertier_scrapy

English | 中文

Introduction

shiertier_scrapy is a Python library designed to simplify the process of downloading images from the web. It provides a robust and flexible interface for handling various HTTP status codes, retries, and image validation. This library is particularly useful for web scraping tasks where image downloads are required.

Installation

You can install shiertier_scrapy via pip:

pip install git+https://github.com/shiertier/shiertier_scrapy.git

Please note that this project is still under development.

Environment Variables and Storage Location

Environment Variables

  • SCRAPY_SAVE_DIR: The directory where downloaded images will be saved. If not provided, the current working directory will be used.

Setting the Storage Location

You can specify the storage location by setting the SCRAPY_SAVE_DIR environment variable:

export SCRAPY_SAVE_DIR=/path/to/save_directory

Alternatively, you can pass the save_dir parameter when initializing the ScrapyClientBase class:

from shiertier_scrapy import ScrapyClientBase

# Initialize with a custom save directory
scrapy_client = ScrapyClientBase(save_dir='/path/to/save_directory')

Usage

Downloading a Single Image

You can download a single image using the download_one method. This method requires the URL of the image and the desired save name. It also supports retries and sleep time between retries.

from shiertier_scrapy import easy_scrapy_client

# Download a single image
easy_scrapy_client.download_one(url='http://example.com/image.jpg', save_name='image.jpg')

Downloading Multiple Images

You can download multiple images concurrently using the download_images method. This method requires a list of URLs and corresponding save names. It also supports retries and sleep time between retries.

from shiertier_scrapy import easy_scrapy_client

# URLs and save names
urls = ['http://example.com/image1.jpg', 'http://example.com/image2.jpg']
save_names = ['image1.jpg', 'image2.jpg']

# Download multiple images
easy_scrapy_client.download_images(urls=urls, save_names=save_names)

Dependencies

  • requests
  • Pillow
  • shiertier_logger
  • tqdm

License

This project is released under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shiertier_scrapy-0.0.3.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shiertier_scrapy-0.0.3-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file shiertier_scrapy-0.0.3.tar.gz.

File metadata

  • Download URL: shiertier_scrapy-0.0.3.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for shiertier_scrapy-0.0.3.tar.gz
Algorithm Hash digest
SHA256 2e051b62834be96351c7f3e44dcb40ae3bf2f87cc3b360427d44b507872cf8b8
MD5 ede2e5f5677166784f07ac5f6bcb7682
BLAKE2b-256 a3077ae0f5a65440dc7f25a7c748c89bab238baf878373fddb95530242fd8128

See more details on using hashes here.

File details

Details for the file shiertier_scrapy-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for shiertier_scrapy-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 31e9c9873c9466afac77e699caf7175015e8a76ae1a04358a0476336cd3a6aff
MD5 79a714389a60c40b3a64bc1acc0e9b90
BLAKE2b-256 715138c6925ae2473c586798640984b053d860064303c7a6132babac3f328e2a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page