Skip to main content

A modern Python library for writing maintainable web scrapers.

Project description

Overview

spatula is a modern Python library for writing maintainable web scrapers.

Please note, the official repository has changed to Codeberg; GitHub will only be used as a mirror.

Source: https://codeberg.org/jpt/spatula/

Documentation: https://jamesturk.github.io/spatula/

Issues: https://codeberg.org/jpt/spatula/issues

PyPI badge

Features

  • Page-oriented design: Encourages writing understandable & maintainable scrapers.
  • Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
  • Fast HTML parsing: Uses lxml.html for fast, consistent, and reliable parsing of HTML.
  • Flexible Data Model Support: Compatible with dataclasses, attrs, pydantic, or bring your own data model classes for storing & validating your scraped data.
  • CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
  • Fully Typed: Makes full use of Python 3 type annotations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spatula-1.0.0.tar.gz (38.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spatula-1.0.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file spatula-1.0.0.tar.gz.

File metadata

  • Download URL: spatula-1.0.0.tar.gz
  • Upload date:
  • Size: 38.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.22

File hashes

Hashes for spatula-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2cec1a736a168e8e0876714be37183160e356a33715a9ddd87837c1303b37f3f
MD5 dee42f76a1cf6be54c7dcbe78dc6674d
BLAKE2b-256 bba129678c9dc9fa9f64372c7ebe33f602d2237cfcfb2eceb8bc44adc5bc91f2

See more details on using hashes here.

File details

Details for the file spatula-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: spatula-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.22

File hashes

Hashes for spatula-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 28c766de4e177ddc514070d2cd2a0235164aeb814abec007abc68c489ad16ca5
MD5 e05ebd1a5025fcce880dce91b69e5859
BLAKE2b-256 64d61e77c4967470a7df768ab0298fca22bad105557c687077817caad4b2b826

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page