Core library for collecting parliamentary data
Project description
Scraper-core
Core library for collectors/scrapers of PaZuFa, providing shared functionality and base classes.
Status: Work in Progress (WIP). For a detailed status overview, see the status page in the wiki (German).
The detailed documentation can be found in the wiki (German).
Requests
If you have a request for the Scraper-core, the best way to voice it is to write a Codeberg issue. Please add the label external-request to it.
If it is a bug you can alternatively use the label Bug.
For requests and questions, you can, of course, contact us on Mattermost.
Structure
The library consists of three parts:
- CoreLib: shared classes and utilities used by all scrapers regardless of implementation approach. This includes Pydantic validation models, API client helpers, common data transformation functions, standardised phrases and tag mappings (e.g. normalising committee names, document types, and Schlagworte across parliaments), and reusable components for tasks like LLM enrichment.
- Scrapy-based: Opinionated implementation of Corelib in Scrapy based classes.
- Collector-based: Our project's scaffolding for scrapers, implemented in an opinionated manner using Corelib.
Requirements
- Python 3.12+
- Poetry 2.x
For the full dependency list, see pyproject.toml.
Setup
See SETUP.md for the full setup guide, which is versioned alongside the code.
Contribution
See CONTRIBUTING.md for development setup, git workflow, code generation, documentation and project context.
License
GPL-3.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pazufa_corelib-0.1.1.tar.gz.
File metadata
- Download URL: pazufa_corelib-0.1.1.tar.gz
- Upload date:
- Size: 103.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fa47dd082d5ae1c125f78a3b5b3681fc9f1d71222add687f52f90e7330105ad
|
|
| MD5 |
fa50e3adc4abc29b0ed2a9248f984498
|
|
| BLAKE2b-256 |
011369c10a1b8415b4e28b16bd236572d10be7b95c8fd553c5ae121188077ef9
|
File details
Details for the file pazufa_corelib-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pazufa_corelib-0.1.1-py3-none-any.whl
- Upload date:
- Size: 164.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
742fe4eb80a39646bd924e8c20aa7e526c275cb77420914acbf0c4bef1a790f5
|
|
| MD5 |
d662e8e6dcf5eac9463d477e737249d4
|
|
| BLAKE2b-256 |
d0f536d5b9e6adaaa49c3139db5407a4af06ed44fa68ab84101a287ed24ec2e3
|