Base library for scrapy's ItemLoader
Project description
itemloaders is a library that helps you collect data from HTML and XML sources.
It comes in handy to extract data from web pages, as it supports data extraction using CSS and XPath Selectors.
It’s specially useful when you need to standardize the data from many sources. For example, it allows you to have all your casting and parsing rules in a single place.
Here is an example to get you started:
from itemloaders import ItemLoader
from parsel import Selector
html_data = '''
<!DOCTYPE html>
<html>
<head>
<title>Some random product page</title>
</head>
<body>
<div class="product_name">Some random product page</div>
<p id="price">$ 100.12</p>
</body>
</html>
'''
loader = ItemLoader(selector=Selector(html_data))
loader.add_xpath('name', '//div[@class="product_name"]/text()')
loader.add_xpath('name', '//div[@class="product_title"]/text()')
loader.add_css('price', '#price::text')
loader.add_value('last_updated', 'today') # you can also use literal values
item = loader.load_item()
item
# {'name': ['Some random product page'], 'price': ['$ 100.12'], 'last_updated': ['today']}
For more information, check out the documentation.
Contributing
All contributions are welcome!
If you want to review some code, check open Pull Requests here
If you want to submit a code change
File an issue here, if there isn’t one yet
Fork this repository
Create a branch to work on your changes
Run pre-commit install to install pre-commit hooks
Push your local branch and submit a Pull Request
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file itemloaders-1.4.0.tar.gz.
File metadata
- Download URL: itemloaders-1.4.0.tar.gz
- Upload date:
- Size: 29.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5338308a819098f43525b7afc5f7d46ba338ba4710f5ebe7a21b3b47bb29929
|
|
| MD5 |
d202bdce0b5fd068614f110e88f3715e
|
|
| BLAKE2b-256 |
05bd916f4fd26e14e6ad292b69693ccca4f192bcaf9f817ba7d6f7162dbbd835
|
Provenance
The following attestation bundles were made for itemloaders-1.4.0.tar.gz:
Publisher:
publish.yml on scrapy/itemloaders
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
itemloaders-1.4.0.tar.gz -
Subject digest:
b5338308a819098f43525b7afc5f7d46ba338ba4710f5ebe7a21b3b47bb29929 - Sigstore transparency entry: 869959897
- Sigstore integration time:
-
Permalink:
scrapy/itemloaders@ad624efc0c7d590ed115a67e0b127dba614b9142 -
Branch / Tag:
refs/tags/v1.4.0 - Owner: https://github.com/scrapy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ad624efc0c7d590ed115a67e0b127dba614b9142 -
Trigger Event:
release
-
Statement type:
File details
Details for the file itemloaders-1.4.0-py3-none-any.whl.
File metadata
- Download URL: itemloaders-1.4.0-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
202b6f855299b4cadfdf78bb93a6cf977899e3c40c4c54524e120a444e65b5ac
|
|
| MD5 |
d0ede87892d84794f484745caa3eae5c
|
|
| BLAKE2b-256 |
ac71d9cd0e4c6a4aace991009fc47362ce9251be0fbcf2b6c533f918b31854d5
|
Provenance
The following attestation bundles were made for itemloaders-1.4.0-py3-none-any.whl:
Publisher:
publish.yml on scrapy/itemloaders
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
itemloaders-1.4.0-py3-none-any.whl -
Subject digest:
202b6f855299b4cadfdf78bb93a6cf977899e3c40c4c54524e120a444e65b5ac - Sigstore transparency entry: 869959904
- Sigstore integration time:
-
Permalink:
scrapy/itemloaders@ad624efc0c7d590ed115a67e0b127dba614b9142 -
Branch / Tag:
refs/tags/v1.4.0 - Owner: https://github.com/scrapy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ad624efc0c7d590ed115a67e0b127dba614b9142 -
Trigger Event:
release
-
Statement type: