Skip to main content

A simple scrapy library for Python.

Project description

shiertier_scrapy

English | 中文

Introduction

shiertier_scrapy is a Python library designed to simplify the process of downloading images from the web. It provides a robust and flexible interface for handling various HTTP status codes, retries, and image validation. This library is particularly useful for web scraping tasks where image downloads are required.

Installation

You can install shiertier_scrapy via pip:

pip install git+https://github.com/shiertier-utils/shiertier_scrapy.git

Please note that this project is still under development.

Environment Variables and Storage Location

Environment Variables

  • SCRAPY_SAVE_DIR: The directory where downloaded images will be saved. If not provided, the current working directory will be used.

Setting the Storage Location

You can specify the storage location by setting the SCRAPY_SAVE_DIR environment variable:

export SCRAPY_SAVE_DIR=/path/to/save_directory

Alternatively, you can pass the save_dir parameter when initializing the ScrapyClientBase class:

from shiertier_scrapy import ScrapyClientBase

# Initialize with a custom save directory
scrapy_client = ScrapyClientBase(save_dir='/path/to/save_directory')

Usage

Downloading a Single Image

You can download a single image using the download_one method. This method requires the URL of the image and the desired save name. It also supports retries and sleep time between retries.

from shiertier_scrapy import easy_scrapy_client

# Download a single image
easy_scrapy_client.download_one(url='http://example.com/image.jpg', save_name='image.jpg')

Downloading Multiple Images

You can download multiple images concurrently using the download_images method. This method requires a list of URLs and corresponding save names. It also supports retries and sleep time between retries.

from shiertier_scrapy import easy_scrapy_client

# URLs and save names
urls = ['http://example.com/image1.jpg', 'http://example.com/image2.jpg']
save_names = ['image1.jpg', 'image2.jpg']

# Download multiple images
easy_scrapy_client.download_images(urls=urls, save_names=save_names)

Dependencies

  • requests
  • Pillow
  • shiertier_logger
  • tqdm

License

This project is released under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shiertier_scrapy-0.0.5.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shiertier_scrapy-0.0.5-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file shiertier_scrapy-0.0.5.tar.gz.

File metadata

  • Download URL: shiertier_scrapy-0.0.5.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for shiertier_scrapy-0.0.5.tar.gz
Algorithm Hash digest
SHA256 56b16e6133dbbe5a02f53a53e8cdfcd8a9f61a21bd76448eaafdd990dd3881dd
MD5 1f9228ea8603e8fff7ffee6668f92c05
BLAKE2b-256 9f767f8f58faa59db57c425b9bf0dd6e2c6bc9bccc5b06aafd0809d027ec02e4

See more details on using hashes here.

File details

Details for the file shiertier_scrapy-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for shiertier_scrapy-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 1cef063cd5e773b8205ae00395814c681442b4eb7e6bb6f8b5ab401b9074fe55
MD5 296f3b8813ae0af2e7089b5293b2290b
BLAKE2b-256 127dab96901ee35f502c9f91a1269dda1d6d274e9d516f270b195c2d4554e8ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page