Skip to main content

Command Line Interface tool for scraping Matricula Online https://data.matricula-online.eu.

Project description

Matricula Online Scraper

PyPI - Python Version GitHub License PyPI - Version Publish to PyPi

Matricula Online is a website that hosts digitized parish registers from various regions across Europe. This CLI-based tool allows you to directly download data from it.

Installation

Make sure to meet the minimum required version of Python. You can install this tool via pip:

$ pip install -u matricula-online-scraper
Build from source

If you want to get the latest version or just build from source, you can clone the repository and install it manually, favorably via uv:

$ git clone git@github.com:lsg551/matricula-online-scraper.git && cd matricula-online-scraper
$ uv venv && uv sync

Alternatively, you can always fallback to pip:

$ pip install -r requirements.txt

Usage

You can use this tool to scrape the three primary entities from Matricula:

  1. Scanned parish registers (→ images of baptism, marriage, and death records)
  2. A list of all available parishes (→ location metadata)
  3. A list for each parish with metadata about its registers, including dates ranges, type etc.

Most users likely want to scrape the scanned parish registers (1). The additional metadata (2,3) can be useful for other purposes such as automation, filtering or searching.

Note that this tool will not format or clean the data in any way. Instead, the data is saved as-is to a file. Some data might be poorly formatted or inconsistent.

Run the following command to see the available commands and options:

$ matricula-online-scraper --help

(1) Example: Download a scanned parish register (all images of a book)

Imagine you opened a certain parish register on Matricula and want to download the entire book or a single page. Let's say you want to download the death register of Bautzen, Germany, starting from 1661. Copy the URL of the register when you are in the image viewer, this might look like https://data.matricula-online.eu/en/deutschland/dresden/bautzen/11/?pg=1.

Then run the following command and paste the URL into the prompt:

$ matricula-online-scraper parish fetch https://data.matricula-online.eu/en/deutschland/dresden/bautzen/11/?pg=1

Run matricula-online-scraper parish fetch --help to see all available options.

(2) Example: Download a huge list of all available parishes on Matricula

$ matricula-online-scraper parish list

This command will fetch all parishes from Matricula Online, effectively scraping the entire "Fonds" page. The resulting data looks like:

country    , region                          , name                 , url                                                                          , longitude         , latitude
Deutschland, "Passau, rk. Bistum"            , Arbing-bei-Neuoetting, https://data.matricula-online.eu/en/deutschland/passau/arbing-bei-neuoetting/, 12.7081934381511  , 48.32953342002908
Österreich , Oberösterreich: Rk. Diözese Linz, Eberschwang          , https://data.matricula-online.eu/en/oesterreich/oberoesterreich/eberschwang/ , 13.5620985        , 48.15550995
Polen      , "Breslau/Wroclaw, Staatsarchiv" , Hermsdorf            , https://data.matricula-online.eu/en/polen/breslau/hermsdorf/                 , 15.642741683666767, 50.84699257482722

It may take a few minutes to complete and will yield a few thousand rows. Each url value leads to the main page of the parish and can bepiped into the next command (3) to fetch metadata about the parish's registers.

[!TIP] The data only changes rarely. A GitHub workflow automatically executes this command once a week and pushes to cache/parishes. This has the advantage that you can download the data without having to run and waiting for the command yourself as well as taking some load off the Matricula servers.

Click here to download the entire CSV: 👉 parishes.csv 👈

Or with cURL:

curl -L https://github.com/lsg551/matricula-online-scraper/raw/cache/parishes/parishes.csv.gz | gunzip > parishes.csv

Cache Parishes GitHub last commit (branch)

Run matricula-online-scraper parish list --help to see all available options.

(3) Example: Download a list about the registers of a single parish

This command will download a list of all available registers for a single parish, including certain metadata such as the type of register, the date range, and the URL to the register itself etc.

$ matricula-online-scraper parish show https://data.matricula-online.eu/en/deutschland/muenster/muenster-st-martini/

A sample from the output (here JSON Lines) looks like this:

{
    "name": "Taufen",
    "url": "https://data.matricula-online.eu/en/deutschland/muenster/muenster-st-martini/KB001/",
    "accession_number": "KB001",
    "date": "1715 - 1800",
    "register_type": "Taufen",
    "date_range_start": "Jan. 1, 1715",
    "date_range_end": "Dec. 31, 1800"
}

Run matricula-online-scraper parish show --help to see all available options.

License

This project is licensed under the MIT License - see the LICENSE file for details.

You can read more about Matricula Online's terms of use and data licenses on their page or check out their robots.txt file at data.matricula-online.eu/robots.txt regarding restrictions of the use of automated tools (as of March 2025, they have none).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matricula_online_scraper-0.7.0.tar.gz (67.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matricula_online_scraper-0.7.0-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file matricula_online_scraper-0.7.0.tar.gz.

File metadata

File hashes

Hashes for matricula_online_scraper-0.7.0.tar.gz
Algorithm Hash digest
SHA256 d887325bc8456e278afd1330e35edee6d44ff0e5fbd16b68b6303e93a0bc285c
MD5 1848ffa858f673c9ff66fe638794b414
BLAKE2b-256 35b645fd10e50923bc72fa7d4f1ecf1e22d4002a881dc3ea1651ef8c25ce1edf

See more details on using hashes here.

File details

Details for the file matricula_online_scraper-0.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for matricula_online_scraper-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 06bc715cdf82dfa2a0c317182c9d8c11927f33a4039c471aa4889cce5fa1e038
MD5 d9b2065463c2551f6f94db472c4f4418
BLAKE2b-256 8bf2c4f3fa221c0dca18f0977e4f5ce204056d95abf50a23aa5d89b17c00ee71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page