Scrapy spider middleware to split an item into multiple items on a multi-valued key
Project description
SplitVariantsMiddleware is a Scrapy spider middleware used to split single items into multiple items when they have a “variants” key with multiple values.
Example usage
Let’s assume your spider outputs an item with different size options (from an ecommerce website for example):
item = {"id": 12, "name": "Big chair", "variants": [{"size": "XL", "price": 200, "currency": "USD"}, {"size": "L", "price": 100, "currency": "USD"}]}
When you enable SplitVariantsMiddleware, this single item will become 2 items with the different variants values into a different item:
{"id": 12, "name": "Big chair", "size": "XL", "price": 200, "currency": "USD"} {"id": 12, "name": "Big chair", "size": "L", "price": 100, "currency": "USD"}
Installation
Install scrapy-splitvariants using pip:
$ pip install scrapy-splitvariants
Configuration
Add SplitVariantsMiddleware by including it in SPIDER_MIDDLEWARES in your settings.py file:
SPIDER_MIDDLEWARES = { 'scrapy_splitvariants.SplitVariantsMiddleware': 100, }
Here, priority 100 is just an example. Set its value depending on other middlewares you may have enabled already.
Enable the middleware using SPLITVARIANTS_ENABLED set to True in your setting.py.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapy-splitvariants-1.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43f393f0380e59e0f8a80771102054f6a9660c5e8cb0f716c517c07b6e2bfdcd |
|
MD5 | 82879a91f37d2b0d4590668b9846d207 |
|
BLAKE2b-256 | ec8c457f1f79de762e2afa8f8cfd163744edd3b586c17b7556eaef9586ffed5d |
Hashes for scrapy_splitvariants-1.1.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da425cdaa03101406f076bd9600429d997ed395288e47bd4032e0a0b23d9f478 |
|
MD5 | f0ea6737148ae0e8b3834d648f6c96fe |
|
BLAKE2b-256 | 33fdb5a0d2d0c8a4ba9636c09f6d47cc5d1ef56561a076c5f9aae89b6e5b68f2 |