Detect and parse historic dates, e.g. to ISO 8601:2-2019.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kristbaum

These details have not been verified by PyPI

Project description

unstruwwel-py

Detect and parse historic dates, e.g. to ISO 8601:2-2019.

This is a Python port of the R package unstruwwel. It automatically converts language-specific verbal information, e.g. "circa 1st half of the 19th century", into its standardized numerical counterparts, e.g. "1801-01-01~/1850-12-31~". It follows the recommendations of the MIDAS (Marburger Informations-, Dokumentations- und Administrations-System); see https://doi.org/10.11588/artdok.00003770.

The name is inspired by Heinrich Hoffmann's rhymed story Struwwelpeter.

Installation

pip install unstruwwel-py

Or, for local development with uv:

uv venv
uv pip install -e ".[dev]"

Usage

The package exposes a single high-level function, unstruwwel(). Pass a string or an iterable of strings; for an iterable a list of results is returned, one per input.

Schemes

"time-span" (default) — a (start, end) tuple of years. Open intervals use math.inf / -math.inf; an undetectable date yields (None, None).
"iso-format" — an ISO 8601:2-2019 string (or None).
"object" — a list of Periods objects, each exposing .time_span, .iso_format, .interval, .fuzzy, and .express.

Safe vs. aggressive mode

Many real-world entries list several distinct datings rather than one period, e.g. "1184, 1750-1752" or "1070-1129, 1672-1674, 1938-1940". Collapsing those into a single (1184, 1752) span is misleading, so the default mode="safe" declines to resolve a compound entry and returns the empty result instead:

unstruwwel("1184, 1750-1752", "de")                       # (None, None)
unstruwwel("1184, 1750-1752", "de", mode="aggressive")    # (1184, 1752)

A single period — including ranges like "1750-1752", "1443 bis 1640", or "16. Jhd. - 18. Jhd." — resolves under both modes. Use mode="aggressive" when you want a best-effort enclosing span for every entry.

English-language examples

from unstruwwel import unstruwwel

dates = [
    "5th century b.c.", "unknown", "late 16th century", "mid-12th century",
    "June 1963", "August 11, 1958", "ca. 1920", "before 1856",
]

unstruwwel(dates, "en", scheme="iso-format")
# ['-0500-12-31/-0401-01-01', None, '1586-01-01/1600-12-31',
#  '1146-01-01/1155-12-31', '1963-06-01/1963-06-30',
#  '1958-08-11/1958-08-11', '1920-01-01~/1920-12-31~', '..1855-12-31']

unstruwwel(dates, "en")  # time-span
# [(-500, -401), (None, None), (1586, 1600), (1146, 1155),
#  (1963, 1963), (1958, 1958), (1920, 1920), (-inf, 1855)]

German-language examples

unstruwwel("letztes Drittel 15. und 1. Hälfte 16. Jahrhundert", "de")
# (1467, 1550)

unstruwwel("wohl nach 1923", "de", scheme="iso-format")
# '1924-01-01?..'

unstruwwel("spätestens 1750er Jahre", "de", scheme="iso-format")
# '..1749-12-31'

Processing a CSV column

A common use case is resolving a whole column of verbal datings, e.g. harvested from a museum or research database. Pass the column as an iterable and you get one result per row back, aligned with the input. The snippet below reads a verbaleDating column, resolves it under both schemes, and writes a new CSV that places the original text next to its start/end years and ISO string for easy comparison:

import csv
from unstruwwel import unstruwwel

with open("verbal_dating.csv", encoding="utf-8") as f:
    rows = [row["verbaleDating"] for row in csv.DictReader(f)]

spans = unstruwwel(rows, "de")                       # [(start, end), ...]
iso = unstruwwel(rows, "de", scheme="iso-format")    # ['1746-01-01/...', ...]

with open("verbal_dating_resolved.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerow(["verbaleDating", "start", "end", "iso"])
    for text, (start, end), iso_str in zip(rows, spans, iso):
        writer.writerow([text, start, end, iso_str])

For the real Deckenmalerei entries below, verbal_dating_resolved.csv then contains:

verbaleDating	start	end	iso
`um 1750`	`1750`	`1750`	`1750-01-01~/1750-12-31~`
`16. Jhd.`	`1501`	`1600`	`1501-01-01/1600-12-31`
`1718-1722`	`1718`	`1722`	`1718-01-01/1722-12-31`
`1685-90`	`1685`	`1690`	`1685-01-01/1690-12-31`
`Mitte 18. Jhd.`	`1746`	`1755`	`1746-01-01/1755-12-31`
`1. Hälfte 18. Jhd.`	`1701`	`1750`	`1701-01-01/1750-12-31`
`14. Jahrhundert - 17. Jahrhundert`	`1301`	`1700`	`1301-01-01/1700-12-31`
`1685/1690`	`1685`	`1690`	`1685-01-01/1690-12-31`
`vor 1756`	`-inf`	`1755`	`..1755-12-31`
`nach 1679`	`1680`	`inf`	`1680-01-01..`
`letztes Viertel des 17. Jahrhunderts`	`1676`	`1700`	`1676-01-01/1700-12-31`
`Ende 17. Jhd.`	`1686`	`1700`	`1686-01-01/1700-12-31`

Unparseable rows — and, under the default safe mode, compound entries that list several distinct datings — yield (None, None) (or None for iso-format) rather than raising, so a malformed entry never aborts a batch. Pass mode="aggressive" to also collapse compound entries into one enclosing span.

Automatic language detection

If language is omitted (or None), the language is detected from the input.

unstruwwel(["19. Jahrhundert", "1. Hälfte 18. Jh."])  # detected: de

Working with period objects

from unstruwwel import Century

Century(15).take("last", type="third").time_span   # (1467, 1500)
Century(15).take(1, type="half").iso_format         # '1401-01-01/1450-12-31'

Supported languages

English (en), German (de), French (fr), and Dutch (nl). Language data lives in src/unstruwwel/data/<code>.json; adding a language is a matter of adding another such file.

Development

uv run pytest

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kristbaum

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.1

Jun 17, 2026

1.0.0

Jun 17, 2026

0.1.0

Jun 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unstruwwel-1.0.1.tar.gz (186.3 kB view details)

Uploaded Jun 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

unstruwwel-1.0.1-py3-none-any.whl (31.1 kB view details)

Uploaded Jun 17, 2026 Python 3

File details

Details for the file unstruwwel-1.0.1.tar.gz.

File metadata

Download URL: unstruwwel-1.0.1.tar.gz
Upload date: Jun 17, 2026
Size: 186.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for unstruwwel-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`584ef17dbd0f762a68801536d4d94985ce79fc6f157eb5f0a5a4659f78813b99`
MD5	`f9e44585097defaa8109cc2b18c5053c`
BLAKE2b-256	`b3b3c301c9d174421045f54dd4db2f48b4baeaf94a16df94ff4ff7a32ad20916`

See more details on using hashes here.

Provenance

The following attestation bundles were made for unstruwwel-1.0.1.tar.gz:

Publisher: publish.yml on kristbaum/unstruwwel-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: unstruwwel-1.0.1.tar.gz
- Subject digest: 584ef17dbd0f762a68801536d4d94985ce79fc6f157eb5f0a5a4659f78813b99
- Sigstore transparency entry: 1848543410
- Sigstore integration time: Jun 17, 2026
Source repository:
- Permalink: kristbaum/unstruwwel-py@a520c2eafe3b065c075404b42793c75430593ae1
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/kristbaum
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a520c2eafe3b065c075404b42793c75430593ae1
- Trigger Event: release

File details

Details for the file unstruwwel-1.0.1-py3-none-any.whl.

File metadata

Download URL: unstruwwel-1.0.1-py3-none-any.whl
Upload date: Jun 17, 2026
Size: 31.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for unstruwwel-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`db1fb9b2ab674982c1c887b297d9cf3193fb047b9432c69811bcde2531b1277d`
MD5	`a16a30cd4baac31fd301e26202e959ed`
BLAKE2b-256	`c5bc85675e5d1eb8c1521f935bf74e977fe770eea4f7000702b048b200cabcad`

See more details on using hashes here.

Provenance

The following attestation bundles were made for unstruwwel-1.0.1-py3-none-any.whl:

Publisher: publish.yml on kristbaum/unstruwwel-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: unstruwwel-1.0.1-py3-none-any.whl
- Subject digest: db1fb9b2ab674982c1c887b297d9cf3193fb047b9432c69811bcde2531b1277d
- Sigstore transparency entry: 1848543604
- Sigstore integration time: Jun 17, 2026
Source repository:
- Permalink: kristbaum/unstruwwel-py@a520c2eafe3b065c075404b42793c75430593ae1
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/kristbaum
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a520c2eafe3b065c075404b42793c75430593ae1
- Trigger Event: release

unstruwwel 1.0.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

unstruwwel-py

Installation

Usage

Schemes

Safe vs. aggressive mode

English-language examples

German-language examples

Processing a CSV column

Automatic language detection

Working with period objects

Supported languages

Development

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance