Skip to main content

Base library for scrapy's ItemLoader

Project description

PyPI Version Supported Python Versions CI Status Coverage report Documentation Status

itemloaders is a library that helps you collect data from HTML and XML sources.

It comes in handy to extract data from web pages, as it supports data extraction using CSS and XPath Selectors.

It’s specially useful when you need to standardize the data from many sources. For example, it allows you to have all your casting and parsing rules in a single place.

Here is an example to get you started:

from itemloaders import ItemLoader
from parsel import Selector

html_data = '''
<!DOCTYPE html>
<html>
    <head>
        <title>Some random product page</title>
    </head>
    <body>
        <div class="product_name">Some random product page</div>
        <p id="price">$ 100.12</p>
    </body>
</html>
'''
loader = ItemLoader(selector=Selector(html_data))
loader.add_xpath('name', '//div[@class="product_name"]/text()')
loader.add_xpath('name', '//div[@class="product_title"]/text()')
loader.add_css('price', '#price::text')
loader.add_value('last_updated', 'today') # you can also use literal values
item = loader.load_item()
item
# {'name': ['Some random product page'], 'price': ['$ 100.12'], 'last_updated': ['today']}

For more information, check out the documentation.

Contributing

All contributions are welcome!

  • If you want to review some code, check open Pull Requests here

  • If you want to submit a code change

    • File an issue here, if there isn’t one yet

    • Fork this repository

    • Create a branch to work on your changes

    • Run pre-commit install to install pre-commit hooks

    • Push your local branch and submit a Pull Request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

itemloaders-1.4.0.tar.gz (29.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

itemloaders-1.4.0-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file itemloaders-1.4.0.tar.gz.

File metadata

  • Download URL: itemloaders-1.4.0.tar.gz
  • Upload date:
  • Size: 29.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for itemloaders-1.4.0.tar.gz
Algorithm Hash digest
SHA256 b5338308a819098f43525b7afc5f7d46ba338ba4710f5ebe7a21b3b47bb29929
MD5 d202bdce0b5fd068614f110e88f3715e
BLAKE2b-256 05bd916f4fd26e14e6ad292b69693ccca4f192bcaf9f817ba7d6f7162dbbd835

See more details on using hashes here.

Provenance

The following attestation bundles were made for itemloaders-1.4.0.tar.gz:

Publisher: publish.yml on scrapy/itemloaders

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file itemloaders-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: itemloaders-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for itemloaders-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 202b6f855299b4cadfdf78bb93a6cf977899e3c40c4c54524e120a444e65b5ac
MD5 d0ede87892d84794f484745caa3eae5c
BLAKE2b-256 ac71d9cd0e4c6a4aace991009fc47362ce9251be0fbcf2b6c533f918b31854d5

See more details on using hashes here.

Provenance

The following attestation bundles were made for itemloaders-1.4.0-py3-none-any.whl:

Publisher: publish.yml on scrapy/itemloaders

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page