Skip to main content

Command Line Interface tool for scraping Matricula Online https://data.matricula-online.eu.

Project description

Matricula Online Scraper

PyPI - Python Version GitHub License PyPI - Version

:warning: This tool is still under development and is NOT yet feature-complete. Expect breaking changes and bugs. Please report any issues.

Matricula Online is a website that hosts parish registers from various regions across Europe. This CLI tool allows you to fetch data from it and save the data to a file.


Our GitHub Workflow automatically scrapes a list with all parishes once a week and pushes to cache/parishes. Download parishes.csv ⚡️

Cache Parishes GitHub last commit (branch)


Note that this tool will not format or clean the data in any way. Instead, the data is saved as-is to a file. I mention this because the original data is especially poorly formatted and contains a lot of inconsistencies. It is up to the user to process the data further.

🔧 Installation

Make sure to have a recent version of Python installed. You can then install this script via pip:

$ pip install --user matricula-online-scraper

Nevertheless, you can clone this repository and run the script with Poetry.

💡 How To Use

$ matricula-online-scraper --help

prints available commands and options, including documentation. Same goes for each subcommand, e.g. matricula-online-scraper fetch --help.

The fetch command is the primary command to fetch any resources from Matricula Online. Its subcommands allow you to scrape different resources, run matricula-online-scraper fetch --help to see available subcommands.

Example 1:

Fetch all available locations and save them to a .jsonl file:

$ matricula-online-scraper fetch locations ./output.jsonl

:warning: This will fetch all parishes from Matricula Online, which may take a few minutes. Despite that, this data only changes rarely, but frequent scraping will put unnecessary load on the server. Therefore our GitHub Workflow caches this data once a week and pushes to cache/parishes. ⚡️ Download CSV ⚡️

Example 2:

Fetch all available register from one parish in Münster, Germany and save them to a .jsonl file:

$ matricula-online-scraper fetch parish ./output.jsonl --urls https://data.matricula-online.eu/en/deutschland/muenster/muenster-st-martini/

License & Contributing

This project is licensed under the MIT License - see the LICENSE file for details.

Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions, especially bug fixes. Please make sure to follow the Contributing Guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matricula_online_scraper-0.5.0.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

matricula_online_scraper-0.5.0-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file matricula_online_scraper-0.5.0.tar.gz.

File metadata

  • Download URL: matricula_online_scraper-0.5.0.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.1 Linux/6.5.0-1021-azure

File hashes

Hashes for matricula_online_scraper-0.5.0.tar.gz
Algorithm Hash digest
SHA256 e31ab30e158c10954a3a98925a1df743ceb6126f66ae01d4536367c5a7a7a184
MD5 4d665c76c1f1bc786a49314841659bb8
BLAKE2b-256 1c8bcb2658fbf14950935935709f6065afb4f395960d786edfd69ac81d27eec4

See more details on using hashes here.

File details

Details for the file matricula_online_scraper-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for matricula_online_scraper-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2d376628569ad5a07201d4824b02fa66e3685199f26718009c90536c7feff7b5
MD5 bc9887866182b14c781df3cd03067fa2
BLAKE2b-256 14159dc86b2c0530e67aed091e7cf9253b41f6dbada70c42b2554cb37a2e6b0d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page