Detect and parse historic dates, e.g. to ISO 8601:2-2019.
Project description
unstruwwel-py
Detect and parse historic dates, e.g. to ISO 8601:2-2019.
This is a Python port of the R package unstruwwel. It automatically converts language-specific verbal information, e.g. "circa 1st half of the 19th century", into its standardized numerical counterparts, e.g. "1801-01-01~/1850-12-31~". It follows the recommendations of the MIDAS (Marburger Informations-, Dokumentations- und Administrations-System); see https://doi.org/10.11588/artdok.00003770.
The name is inspired by Heinrich Hoffmann's rhymed story Struwwelpeter.
Installation
pip install unstruwwel-py
Or, for local development with uv:
uv venv
uv pip install -e ".[dev]"
Usage
The package exposes a single high-level function, unstruwwel(). Pass a string
or an iterable of strings; for an iterable a list of results is returned, one
per input.
Schemes
"time-span"(default) — a(start, end)tuple of years. Open intervals usemath.inf/-math.inf; an undetectable date yields(None, None)."iso-format"— an ISO 8601:2-2019 string (orNone)."object"— a list ofPeriodsobjects, each exposing.time_span,.iso_format,.interval,.fuzzy, and.express.
Safe vs. aggressive mode
Many real-world entries list several distinct datings rather than one period,
e.g. "1184, 1750-1752" or "1070-1129, 1672-1674, 1938-1940". Collapsing
those into a single (1184, 1752) span is misleading, so the default
mode="safe" declines to resolve a compound entry and returns the empty result
instead:
unstruwwel("1184, 1750-1752", "de") # (None, None)
unstruwwel("1184, 1750-1752", "de", mode="aggressive") # (1184, 1752)
A single period — including ranges like "1750-1752", "1443 bis 1640", or
"16. Jhd. - 18. Jhd." — resolves under both modes. Use mode="aggressive"
when you want a best-effort enclosing span for every entry.
English-language examples
from unstruwwel import unstruwwel
dates = [
"5th century b.c.", "unknown", "late 16th century", "mid-12th century",
"June 1963", "August 11, 1958", "ca. 1920", "before 1856",
]
unstruwwel(dates, "en", scheme="iso-format")
# ['-0500-12-31/-0401-01-01', None, '1586-01-01/1600-12-31',
# '1146-01-01/1155-12-31', '1963-06-01/1963-06-30',
# '1958-08-11/1958-08-11', '1920-01-01~/1920-12-31~', '..1855-12-31']
unstruwwel(dates, "en") # time-span
# [(-500, -401), (None, None), (1586, 1600), (1146, 1155),
# (1963, 1963), (1958, 1958), (1920, 1920), (-inf, 1855)]
German-language examples
unstruwwel("letztes Drittel 15. und 1. Hälfte 16. Jahrhundert", "de")
# (1467, 1550)
unstruwwel("wohl nach 1923", "de", scheme="iso-format")
# '1924-01-01?..'
unstruwwel("spätestens 1750er Jahre", "de", scheme="iso-format")
# '..1749-12-31'
Processing a CSV column
A common use case is resolving a whole column of verbal datings, e.g. harvested
from a museum or research database. Pass the column as an iterable and you get
one result per row back, aligned with the input. The snippet below reads a
verbaleDating column, resolves it under both schemes, and writes a new CSV
that places the original text next to its start/end years and ISO string
for easy comparison:
import csv
from unstruwwel import unstruwwel
with open("verbal_dating.csv", encoding="utf-8") as f:
rows = [row["verbaleDating"] for row in csv.DictReader(f)]
spans = unstruwwel(rows, "de") # [(start, end), ...]
iso = unstruwwel(rows, "de", scheme="iso-format") # ['1746-01-01/...', ...]
with open("verbal_dating_resolved.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(["verbaleDating", "start", "end", "iso"])
for text, (start, end), iso_str in zip(rows, spans, iso):
writer.writerow([text, start, end, iso_str])
For the real Deckenmalerei entries below, verbal_dating_resolved.csv then
contains:
| verbaleDating | start | end | iso |
|---|---|---|---|
um 1750 |
1750 |
1750 |
1750-01-01~/1750-12-31~ |
16. Jhd. |
1501 |
1600 |
1501-01-01/1600-12-31 |
1718-1722 |
1718 |
1722 |
1718-01-01/1722-12-31 |
1685-90 |
1685 |
1690 |
1685-01-01/1690-12-31 |
Mitte 18. Jhd. |
1746 |
1755 |
1746-01-01/1755-12-31 |
1. Hälfte 18. Jhd. |
1701 |
1750 |
1701-01-01/1750-12-31 |
14. Jahrhundert - 17. Jahrhundert |
1301 |
1700 |
1301-01-01/1700-12-31 |
1685/1690 |
1685 |
1690 |
1685-01-01/1690-12-31 |
vor 1756 |
-inf |
1755 |
..1755-12-31 |
nach 1679 |
1680 |
inf |
1680-01-01.. |
letztes Viertel des 17. Jahrhunderts |
1676 |
1700 |
1676-01-01/1700-12-31 |
Ende 17. Jhd. |
1686 |
1700 |
1686-01-01/1700-12-31 |
Unparseable rows — and, under the default safe mode, compound entries that list
several distinct datings — yield (None, None) (or None for iso-format)
rather than raising, so a malformed entry never aborts a batch. Pass
mode="aggressive" to also collapse compound entries into one enclosing span.
Automatic language detection
If language is omitted (or None), the language is detected from the input.
unstruwwel(["19. Jahrhundert", "1. Hälfte 18. Jh."]) # detected: de
Working with period objects
from unstruwwel import Century
Century(15).take("last", type="third").time_span # (1467, 1500)
Century(15).take(1, type="half").iso_format # '1401-01-01/1450-12-31'
Supported languages
English (en), German (de), French (fr), and Dutch (nl). Language data
lives in src/unstruwwel/data/<code>.json; adding a language is a matter of
adding another such file.
Development
uv run pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file unstruwwel-1.0.1.tar.gz.
File metadata
- Download URL: unstruwwel-1.0.1.tar.gz
- Upload date:
- Size: 186.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
584ef17dbd0f762a68801536d4d94985ce79fc6f157eb5f0a5a4659f78813b99
|
|
| MD5 |
f9e44585097defaa8109cc2b18c5053c
|
|
| BLAKE2b-256 |
b3b3c301c9d174421045f54dd4db2f48b4baeaf94a16df94ff4ff7a32ad20916
|
Provenance
The following attestation bundles were made for unstruwwel-1.0.1.tar.gz:
Publisher:
publish.yml on kristbaum/unstruwwel-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
unstruwwel-1.0.1.tar.gz -
Subject digest:
584ef17dbd0f762a68801536d4d94985ce79fc6f157eb5f0a5a4659f78813b99 - Sigstore transparency entry: 1848543410
- Sigstore integration time:
-
Permalink:
kristbaum/unstruwwel-py@a520c2eafe3b065c075404b42793c75430593ae1 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/kristbaum
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a520c2eafe3b065c075404b42793c75430593ae1 -
Trigger Event:
release
-
Statement type:
File details
Details for the file unstruwwel-1.0.1-py3-none-any.whl.
File metadata
- Download URL: unstruwwel-1.0.1-py3-none-any.whl
- Upload date:
- Size: 31.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db1fb9b2ab674982c1c887b297d9cf3193fb047b9432c69811bcde2531b1277d
|
|
| MD5 |
a16a30cd4baac31fd301e26202e959ed
|
|
| BLAKE2b-256 |
c5bc85675e5d1eb8c1521f935bf74e977fe770eea4f7000702b048b200cabcad
|
Provenance
The following attestation bundles were made for unstruwwel-1.0.1-py3-none-any.whl:
Publisher:
publish.yml on kristbaum/unstruwwel-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
unstruwwel-1.0.1-py3-none-any.whl -
Subject digest:
db1fb9b2ab674982c1c887b297d9cf3193fb047b9432c69811bcde2531b1277d - Sigstore transparency entry: 1848543604
- Sigstore integration time:
-
Permalink:
kristbaum/unstruwwel-py@a520c2eafe3b065c075404b42793c75430593ae1 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/kristbaum
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a520c2eafe3b065c075404b42793c75430593ae1 -
Trigger Event:
release
-
Statement type: