Skip to main content

Command Line Interface tool for scraping Matricula Online https://data.matricula-online.eu.

Project description

Matricula Online Scraper

PyPI - Python Version GitHub License PyPI - Version Publish to PyPi

:warning: This tool is still under development and is NOT yet feature-complete. Expect breaking changes and bugs. Please report any issues.

Matricula Online is a website that hosts parish registers from various regions across Europe. This CLI tool allows you to fetch data from it and save the data to a file.


Our GitHub Workflow automatically scrapes a list with all parishes once a week and pushes to cache/parishes. Download parishes.csv ⚡️

Cache Parishes GitHub last commit (branch)


Note that this tool will not format or clean the data in any way. Instead, the data is saved as-is to a file. I mention this because the original data is especially poorly formatted and contains a lot of inconsistencies. It is up to the user to process the data further.

🔧 Installation

Make sure to have a recent version of Python installed. You can then install this script via pip:

$ pip install --user matricula-online-scraper

Nevertheless, you can clone this repository and run the script with Poetry.

💡 How To Use

$ matricula-online-scraper --help

prints available commands and options, including documentation. Same goes for each subcommand, e.g. matricula-online-scraper fetch --help.

The fetch command is the primary command to fetch any resources from Matricula Online. Its subcommands allow you to scrape different resources, run matricula-online-scraper fetch --help to see available subcommands.

Example 1:

Fetch all available locations and save them to a .jsonl file:

$ matricula-online-scraper fetch locations ./output.jsonl

:warning: This will fetch all parishes from Matricula Online, which may take a few minutes. Despite that, this data only changes rarely, but frequent scraping will put unnecessary load on the server. Therefore our GitHub Workflow caches this data once a week and pushes to cache/parishes. ⚡️ Download CSV ⚡️

Example 2:

Fetch all available register from one parish in Münster, Germany and save them to a .jsonl file:

$ matricula-online-scraper fetch parish ./output.jsonl --urls https://data.matricula-online.eu/en/deutschland/muenster/muenster-st-martini/

License & Contributing

This project is licensed under the MIT License - see the LICENSE file for details.

Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions, especially bug fixes. Please make sure to follow the Contributing Guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matricula_online_scraper-0.5.1.tar.gz (51.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matricula_online_scraper-0.5.1-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file matricula_online_scraper-0.5.1.tar.gz.

File metadata

File hashes

Hashes for matricula_online_scraper-0.5.1.tar.gz
Algorithm Hash digest
SHA256 dc649ce0d99273669deb1127692b9452ca91567b91923d4042110dfbb255d672
MD5 39d2212be14e87945adc1b4f6dc0019c
BLAKE2b-256 6952b1c9620872a51ec880dd6d57dae81d7589ee0312691ef4e0821c30f9bb2f

See more details on using hashes here.

File details

Details for the file matricula_online_scraper-0.5.1-py3-none-any.whl.

File metadata

File hashes

Hashes for matricula_online_scraper-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c90161e4d52ef5d96cb8e4f32e4d52c4b25cb41b520d2ee32ab1293510ef42b2
MD5 d8281312f2c358a4cf3ea7549e25d368
BLAKE2b-256 a90533e2922fcb2a25411b5a212130e331ddbc2d951802f96c08b54a61b688be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page