scrapy-pagestorage

Scrapy extension to store info in storage service

These details have not been verified by PyPI

Project links

Homepage

Project description

A scrapy extension to store requests and responses information in storage service.

Installation

You can install scrapy-pagestorage using pip:

pip install scrapy-pagestorage

You can then enable the middleware in your settings.py:

SPIDER_MIDDLEWARES = {
    ...
    'scrapy_pagestorage.PageStorageMiddleware': 900
}

How to use it

Enable extension through settings.py:

PAGE_STORAGE_ENABLED = True
PAGE_STORAGE_ON_ERROR_ENABLED = True

Configure the exension through settings.py:

PAGE_STORAGE_MODE = "VERSIONED_CACHE"
PAGE_STORAGE_LIMIT = 100
PAGE_STORAGE_ON_ERROR_LIMIT = 100
PAGE_STORAGE_TRIM_HTML = True

The extension is auto-enabled for Portia spiders (SHUB_SPIDER_TYPE=portia).

Settings

PAGE_STORAGE_MODE

Default: None

A string which specifies if the extension will store information using cache store or versioned cache store (set PAGE_STORAGE_MODE=”VERSIONED_CACHE” to use versioned one).

PAGE_STORAGE_LIMIT

An integer to set a limit of visited pages amount to store.

PAGE_STORAGE_ON_ERROR_LIMIT

An integer to set a limit for page errors amount to store.

PAGE_STORAGE_TRIM_HTML

Default: False

Remove whitespace from the start and end of the HTML to reduce file size.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.4.0

Mar 11, 2022

0.3.1

Oct 16, 2019

0.3.0

Aug 6, 2019

0.2.2

Oct 3, 2018

0.2.1

Aug 16, 2017

0.2.0

Feb 7, 2017

0.1.0

Apr 27, 2016

0.0.1

Apr 27, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-pagestorage-0.4.0.tar.gz (4.7 kB view details)

Uploaded Mar 11, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scrapy_pagestorage-0.4.0-py2.py3-none-any.whl (4.9 kB view details)

Uploaded Mar 11, 2022 Python 2Python 3

File details

Details for the file scrapy-pagestorage-0.4.0.tar.gz.

File metadata

Download URL: scrapy-pagestorage-0.4.0.tar.gz
Upload date: Mar 11, 2022
Size: 4.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for scrapy-pagestorage-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`344b906ddc2e5ec1dcddbaf202c00534a0d17b58d66343dda41b0a171554ba78`
MD5	`11f1a6fe477c08039300def795fe3fdd`
BLAKE2b-256	`d4a1bb6d774bdd5d5eb911deb2ed33fa29dd4dad4ae3efdab8991dfe6eaf0d14`

See more details on using hashes here.

File details

Details for the file scrapy_pagestorage-0.4.0-py2.py3-none-any.whl.

File metadata

Download URL: scrapy_pagestorage-0.4.0-py2.py3-none-any.whl
Upload date: Mar 11, 2022
Size: 4.9 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for scrapy_pagestorage-0.4.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`3e5d1472bbc4623ee8c985b6968ede64c03dd18b15abad08c0a146cd00947c73`
MD5	`d7483559011ab6fb4e1c7734ed728107`
BLAKE2b-256	`93236abe5290e9451234c14accfd1086453a591178f00197e03cf07afd1c3fb3`

See more details on using hashes here.

scrapy-pagestorage 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

How to use it

Settings

PAGE_STORAGE_MODE

PAGE_STORAGE_LIMIT

PAGE_STORAGE_ON_ERROR_LIMIT

PAGE_STORAGE_TRIM_HTML

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes