Command Line Interface tool for scraping Matricula Online https://data.matricula-online.eu.
Project description
Matricula Online Scraper
:warning: This tool is still under development and is NOT yet feature-complete. Expect breaking changes and bugs. Please report any issues.
Matricula Online is a website that hosts parish registers from various regions across Europe. This CLI tool allows you to fetch data from it and save the data to a file.
Our GitHub Workflow automatically scrapes a list with all parishes once a week and pushes to cache/parishes
. Download parishes.csv
⚡️
Note that this tool will not format or clean the data in any way. Instead, the data is saved as-is to a file. I mention this because the original data is especially poorly formatted and contains a lot of inconsistencies. It is up to the user to process the data further.
🔧 Installation
Make sure to have a recent version of Python installed. You can then install this script via pip
:
$ pip install --user matricula-online-scraper
Nevertheless, you can clone this repository and run the script with Poetry.
💡 How To Use
$ matricula-online-scraper --help
prints available commands and options, including documentation. Same goes for each subcommand, e.g. matricula-online-scraper fetch --help
.
The fetch
command is the primary command to fetch any resources from Matricula Online. Its subcommands allow you to scrape different resources, run matricula-online-scraper fetch --help
to see available subcommands.
Example 1:
Fetch all available locations and save them to a .jsonl
file:
$ matricula-online-scraper fetch locations ./output.jsonl
:warning: This will fetch all parishes from Matricula Online, which may take a few minutes. Despite that, this data only changes rarely, but frequent scraping will put unnecessary load on the server. Therefore our GitHub Workflow caches this data once a week and pushes to
cache/parishes
. ⚡️ Download CSV ⚡️
Example 2:
Fetch all available register from one parish in Münster, Germany and save them to a .jsonl
file:
$ matricula-online-scraper fetch parish ./output.jsonl --urls https://data.matricula-online.eu/en/deutschland/muenster/muenster-st-martini/
License & Contributing
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions, especially bug fixes. Please make sure to follow the Contributing Guidelines.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file matricula_online_scraper-0.5.0.tar.gz
.
File metadata
- Download URL: matricula_online_scraper-0.5.0.tar.gz
- Upload date:
- Size: 11.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.1 Linux/6.5.0-1021-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e31ab30e158c10954a3a98925a1df743ceb6126f66ae01d4536367c5a7a7a184 |
|
MD5 | 4d665c76c1f1bc786a49314841659bb8 |
|
BLAKE2b-256 | 1c8bcb2658fbf14950935935709f6065afb4f395960d786edfd69ac81d27eec4 |
File details
Details for the file matricula_online_scraper-0.5.0-py3-none-any.whl
.
File metadata
- Download URL: matricula_online_scraper-0.5.0-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.1 Linux/6.5.0-1021-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d376628569ad5a07201d4824b02fa66e3685199f26718009c90536c7feff7b5 |
|
MD5 | bc9887866182b14c781df3cd03067fa2 |
|
BLAKE2b-256 | 14159dc86b2c0530e67aed091e7cf9253b41f6dbada70c42b2554cb37a2e6b0d |