Skip to main content

Core library for collecting parliamentary data

Project description

Scraper-core

Core library for collectors/scrapers of PaZuFa, providing shared functionality and base classes.

Status: Work in Progress (WIP). For a detailed status overview, see the status page in the wiki (German).

The detailed documentation can be found in the wiki (German).

Requests

If you have a request for the Scraper-core, the best way to voice it is to write a Codeberg issue. Please add the label external-request to it.

If it is a bug you can alternatively use the label Bug.

For requests and questions, you can, of course, contact us on Mattermost.

Structure

The library consists of three parts:

  1. CoreLib: shared classes and utilities used by all scrapers regardless of implementation approach. This includes Pydantic validation models, API client helpers, common data transformation functions, standardised phrases and tag mappings (e.g. normalising committee names, document types, and Schlagworte across parliaments), and reusable components for tasks like LLM enrichment.
  2. Scrapy-based: Opinionated implementation of Corelib in Scrapy based classes.
  3. Collector-based: Our project's scaffolding for scrapers, implemented in an opinionated manner using Corelib.

Requirements

  • Python 3.12+
  • Poetry 2.x

For the full dependency list, see pyproject.toml.

Setup

See SETUP.md for the full setup guide, which is versioned alongside the code.

Contribution

See CONTRIBUTING.md for development setup, git workflow, code generation, documentation and project context.

License

GPL-3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pazufa_corelib-0.1.0.tar.gz (85.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pazufa_corelib-0.1.0-py3-none-any.whl (143.2 kB view details)

Uploaded Python 3

File details

Details for the file pazufa_corelib-0.1.0.tar.gz.

File metadata

  • Download URL: pazufa_corelib-0.1.0.tar.gz
  • Upload date:
  • Size: 85.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for pazufa_corelib-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fa673066939c7609a130f06e1a7bc5de2b0dbc8f386f8003523c42f96c2acdc0
MD5 a776fa37ffc7f8a54ef5c504de1d00dc
BLAKE2b-256 f79d2da477769fd1bdec029e62579921500281ef88e777389340fd7c7c1931df

See more details on using hashes here.

File details

Details for the file pazufa_corelib-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pazufa_corelib-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 143.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for pazufa_corelib-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 99b89f84aaef6e49774496bbf225c3ed630219dc399765714411d5d1309afda6
MD5 85b9ff03252960d143239292cd3bae79
BLAKE2b-256 41943083b6af28197abcb237e6527e7774c5646ea51f2154b4e5bccfcf9eb2bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page