A modern Python library for writing maintainable web scrapers.
spatula is a modern Python library for writing maintainable web scrapers.
- Page-oriented design: Encourages writing understandable & maintainable scrapers.
- Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
- Fast HTML parsing: Uses
lxml.htmlfor fast, consistent, and reliable parsing of HTML.
- Flexible Data Model Support: Compatible with
pydantic, or bring your own data model classes for storing & validating your scraped data.
- CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
- Fully Typed: Makes full use of Python 3 type annotations.
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.