Skip to main content

A modern Python library for writing maintainable web scrapers.

Project description

Overview

spatula is a modern Python library for writing maintainable web scrapers.

Source: https://github.com/jamesturk/spatula

Documentation: https://jamesturk.github.io/spatula/

Issues: https://github.com/jamesturk/spatula/issues

PyPI badge Test badge

Features

  • Page-oriented design: Encourages writing understandable & maintainable scrapers.
  • Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
  • Fast HTML parsing: Uses lxml.html for fast, consistent, and reliable parsing of HTML.
  • Flexible Data Model Support: Compatible with dataclasses, attrs, pydantic, or bring your own data model classes for storing & validating your scraped data.
  • CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
  • Fully Typed: Makes full use of Python 3 type annotations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spatula-0.9.1.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

spatula-0.9.1-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file spatula-0.9.1.tar.gz.

File metadata

  • Download URL: spatula-0.9.1.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.4 Darwin/23.3.0

File hashes

Hashes for spatula-0.9.1.tar.gz
Algorithm Hash digest
SHA256 245a71e46f01c2bd4ba8f67f979cfbf116caeaa3b17bf8b3110d807dab51a329
MD5 ad026bf4453f6783e1ee99398bdfec96
BLAKE2b-256 b736ed4463a40ee0c2e48c71603fb204f74bf7dd5ee2e57de93e9fb1c8bfc7aa

See more details on using hashes here.

File details

Details for the file spatula-0.9.1-py3-none-any.whl.

File metadata

  • Download URL: spatula-0.9.1-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.4 Darwin/23.3.0

File hashes

Hashes for spatula-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 adf6474504090943a78e1507c7b00e38ee0fd761cf4c136696975d840ac8c798
MD5 fea7d831d36eda47419a9579c4f3ff3c
BLAKE2b-256 0c4bf8650ff2003220b6edd166f188c32e67c64f58f9a5c259f39c61f9a29355

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page