Skip to main content

Core library for collecting parliamentary data

Project description

PyPI PyPI - Python Version License: GPL v3 Pydantic v2 Ruff Checked with mypy Poetry pytest

Scraper-core

Core library for collectors/scrapers of PaZuFa, providing shared functionality and base classes.

Status: Work in Progress (WIP). For a detailed status overview, see the status page in the wiki (German).

The detailed documentation can be found in the wiki (German).

Requests

If you have a request for the Scraper-core, the best way to voice it is to write a Codeberg issue. Please add the label external-request to it.

If it is a bug you can alternatively use the label Bug.

For requests and questions, you can, of course, contact us on Mattermost.

Structure

The library consists of three parts:

  1. CoreLib: shared classes and utilities used by all scrapers regardless of implementation approach. This includes Pydantic validation models, API client helpers, common data transformation functions, standardised phrases and tag mappings (e.g. normalising committee names, document types, and Schlagworte across parliaments), and reusable components for tasks like LLM enrichment.
  2. Scrapy-based: Opinionated implementation of Corelib in Scrapy based classes.
  3. Collector-based: Our project's scaffolding for scrapers, implemented in an opinionated manner using Corelib.

Requirements

  • Python 3.12+
  • Poetry 2.x

For the full dependency list, see pyproject.toml.

Setup

See SETUP.md for the full setup guide, which is versioned alongside the code.

Contribution

See CONTRIBUTING.md for development setup, git workflow, code generation, documentation and project context.

License

GPL-3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pazufa_corelib-0.1.1.tar.gz (103.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pazufa_corelib-0.1.1-py3-none-any.whl (164.5 kB view details)

Uploaded Python 3

File details

Details for the file pazufa_corelib-0.1.1.tar.gz.

File metadata

  • Download URL: pazufa_corelib-0.1.1.tar.gz
  • Upload date:
  • Size: 103.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for pazufa_corelib-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9fa47dd082d5ae1c125f78a3b5b3681fc9f1d71222add687f52f90e7330105ad
MD5 fa50e3adc4abc29b0ed2a9248f984498
BLAKE2b-256 011369c10a1b8415b4e28b16bd236572d10be7b95c8fd553c5ae121188077ef9

See more details on using hashes here.

File details

Details for the file pazufa_corelib-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pazufa_corelib-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 164.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for pazufa_corelib-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 742fe4eb80a39646bd924e8c20aa7e526c275cb77420914acbf0c4bef1a790f5
MD5 d662e8e6dcf5eac9463d477e737249d4
BLAKE2b-256 d0f536d5b9e6adaaa49c3139db5407a4af06ed44fa68ab84101a287ed24ec2e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page