Skip to main content

Command Line Interface tool for scraping Matricula Online https://data.matricula-online.eu.

Project description

Matricula Online Scraper

PyPI - Python Version GitHub License PyPI - Version

:warning: This tool is still under development and is NOT yet feature-complete. Expect breaking changes and bugs. Please report any issues.

Matricula Online is a website that hosts parish registers from various regions across Europe. This CLI tool allows you to fetch data from it and save the data to a file.

Note that this tool will not format or clean the data in any way. Instead, the data is saved as-is to a file. I mention this because the original data is especially poorly formatted and contains a lot of inconsistencies. It is up to the user to process the data further.

🔧 Installation

Make sure to have a recent version of Python installed. You can then install this script via pip:

$ pip install --user matricula-online-scraper

Nevertheless, you can clone this repository and run the script with Poetry.

💡 How To Use

$ matricula-online-scraper --help

prints available commands and options, including documentation. Same goes for each subcommand, e.g. matricula-online-scraper fetch --help.

The fetch command is the primary command to fetch any resources from Matricula Online. Its subcommands allow you to scrape different resources, run matricula-online-scraper fetch --help to see available subcommands.

Example 1:

Fetch all available locations and save them to a .jsonl file:

$ matricula-online-scraper fetch locations ./output.jsonl

Example 2:

Fetch all available register from one parish in Münster, Germany and save them to a .jsonl file:

$ matricula-online-scraper fetch parish ./output.jsonl --urls https://data.matricula-online.eu/en/deutschland/muenster/muenster-st-martini/

License & Contributing

This project is licensed under the MIT License - see the LICENSE file for details.

Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions, especially bug fixes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matricula_online_scraper-0.4.1.tar.gz (9.5 kB view hashes)

Uploaded Source

Built Distribution

matricula_online_scraper-0.4.1-py3-none-any.whl (12.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page