Skip to main content

A simple scrapy library for Python.

Project description

shiertier_scrapy

English | 中文

Introduction

shiertier_scrapy is a Python library designed to simplify the process of downloading images from the web. It provides a robust and flexible interface for handling various HTTP status codes, retries, and image validation. This library is particularly useful for web scraping tasks where image downloads are required.

Installation

You can install shiertier_scrapy via pip:

pip install git+https://github.com/shiertier/shiertier_scrapy.git

Please note that this project is still under development.

Environment Variables and Storage Location

Environment Variables

  • SCRAPY_SAVE_DIR: The directory where downloaded images will be saved. If not provided, the current working directory will be used.

Setting the Storage Location

You can specify the storage location by setting the SCRAPY_SAVE_DIR environment variable:

export SCRAPY_SAVE_DIR=/path/to/save_directory

Alternatively, you can pass the save_dir parameter when initializing the ScrapyClientBase class:

from shiertier_scrapy import ScrapyClientBase

# Initialize with a custom save directory
scrapy_client = ScrapyClientBase(save_dir='/path/to/save_directory')

Usage

Downloading a Single Image

You can download a single image using the download_one method. This method requires the URL of the image and the desired save name. It also supports retries and sleep time between retries.

from shiertier_scrapy import easy_scrapy_client

# Download a single image
easy_scrapy_client.download_one(url='http://example.com/image.jpg', save_name='image.jpg')

Downloading Multiple Images

You can download multiple images concurrently using the download_images method. This method requires a list of URLs and corresponding save names. It also supports retries and sleep time between retries.

from shiertier_scrapy import easy_scrapy_client

# URLs and save names
urls = ['http://example.com/image1.jpg', 'http://example.com/image2.jpg']
save_names = ['image1.jpg', 'image2.jpg']

# Download multiple images
easy_scrapy_client.download_images(urls=urls, save_names=save_names)

Dependencies

  • requests
  • Pillow
  • shiertier_logger
  • tqdm

License

This project is released under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shiertier_scrapy-0.0.2.tar.gz (3.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shiertier_scrapy-0.0.2-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file shiertier_scrapy-0.0.2.tar.gz.

File metadata

  • Download URL: shiertier_scrapy-0.0.2.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for shiertier_scrapy-0.0.2.tar.gz
Algorithm Hash digest
SHA256 64797cf4c7ad7a6a788d14defb131e5fa012c4975bf409f5600f41323f3aa2ae
MD5 51de63e71f04bb6d33bede93a76fe32c
BLAKE2b-256 0080a7d9037b2db09a45d7ea45f59451a74f1bc9c2beea8eb067ba2a7af08403

See more details on using hashes here.

File details

Details for the file shiertier_scrapy-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for shiertier_scrapy-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ba005e4b785e24787afd1976d864371f64bc0e5a2c557d8116af2bca04a4ec94
MD5 8d61c67e153e6275ad0d1fe49c0a9a9b
BLAKE2b-256 979f21dd5c1840d2461a90602c4959d6fce96b2dc1258277a85cfac36bad5b8d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page