Skip to main content

Batch date parsing with ambiguity detection, confidence scores, and format lock-in.

Project description

datemonkey

Batch date parsing with ambiguity detection, confidence scores, and format lock-in.

The problem: dateutil.parser.parse("01/02/03") silently guesses and is often wrong. DD/MM vs MM/DD ambiguity corrupts joins, aggregations, and reports. datemonkey detects ambiguity and tells you about it instead of guessing.

Install

pip install datemonkey

Quick Start

Detect format from a column of values

from datemonkey import detect_format

result = detect_format(["15/03/2024", "20/04/2024", "25/12/2024"])
print(result.format.label)      # "European date (DD/MM/YYYY)"
print(result.confidence)         # Confidence.HIGH
print(result.is_ambiguous)       # False — day > 12 resolves it

Ambiguity detection

result = detect_format(["01/02/2024", "03/04/2024", "05/06/2024"])
print(result.is_ambiguous)       # True
print(result.ambiguities)        # [AmbiguityType.DAY_MONTH_SWAP]
print(result.warnings)
# ["Ambiguous: cannot distinguish US date (MM/DD/YYYY) from European date (DD/MM/YYYY) ..."]

Resolve ambiguity with locale preference

result = detect_format(["01/02/2024", "03/04/2024"], locale_preference="eu")
print(result.format.label)       # "European date (DD/MM/YYYY)"

Parse a batch of dates

from datemonkey import parse_dates

batch = parse_dates(["2024-03-15", "2024-04-20", "2024-12-25"])
print(batch.ok)                  # True
print(batch.dates)               # [datetime(2024,3,15), datetime(2024,4,20), datetime(2024,12,25)]
print(batch.iso_strings)         # ["2024-03-15T00:00:00", ...]

Format lock-in

from datemonkey import parse_dates, ISO_8601

batch = parse_dates(["2024-03-15", "03/15/2024"], format=ISO_8601)
print(batch.results[0].ok)       # True  — matches ISO
print(batch.results[1].ok)       # False — doesn't match, flagged not re-guessed

Strict mode

batch = parse_dates(["01/02/2024", "03/04/2024"], strict=True)
print(batch.parsed_count)        # 0 — refuses to parse ambiguous data
print(batch.warnings)            # ["Strict mode: refusing to parse due to DD/MM vs MM/DD ambiguity..."]

Excel serial dates

from datemonkey import parse_dates, excel_serial_to_datetime

# Single value
dt = excel_serial_to_datetime(45292)  # datetime(2024, 1, 1)

# Batch — auto-detected
batch = parse_dates(["45292", "45293", "45294"])
print(batch.detected_format.label)  # "Excel serial date number"

Per-value results

batch = parse_dates(["2024-03-15", "garbage", "2024-12-25"], format="%Y-%m-%d")
for r in batch.results:
    print(f"{r.original:20s} ok={r.ok}  parsed={r.iso}  warnings={r.warnings}")
# 2024-03-15           ok=True   parsed=2024-03-15T00:00:00  warnings=[]
# garbage              ok=False  parsed=None                  warnings=[...]
# 2024-12-25           ok=True   parsed=2024-12-25T00:00:00  warnings=[]

CLI

# Detect format
datemonkey detect "15/03/2024" "20/04/2024" "25/12/2024"

# Detect with JSON output
datemonkey detect --json "01/02/2024" "03/04/2024"

# Parse dates
datemonkey parse "2024-03-15" "2024-04-20"

# Parse from CSV file (column 2, skip header)
datemonkey parse --file data.csv --column 2 --skip-header

# Parse with explicit format
datemonkey parse --format "%d-%m-%Y" "15-03-2024"

# Parse in strict mode
datemonkey parse --strict "01/02/2024" "03/04/2024"

# List known formats
datemonkey formats

API Reference

detect_format(values, *, locale_preference=None, formats=None) -> FormatDetectionResult

Analyze a batch and determine the most likely format, reporting ambiguity.

  • values: List of date-like values (strings, ints, floats, None)
  • locale_preference: "us" for MM/DD, "eu" for DD/MM (only used when data alone can't resolve)
  • formats: Custom list of DateFormat objects to test

parse_dates(values, *, format=None, locale_preference=None, strict=False) -> BatchResult

Parse a batch with format lock-in.

  • format: A DateFormat object or strftime string. If None, auto-detected.
  • strict: If True, refuse to parse when DD/MM vs MM/DD is ambiguous.

excel_serial_to_datetime(serial) -> datetime | None

Convert an Excel serial date number to a Python datetime.

Result Objects

Object Key Properties
FormatDetectionResult .format, .confidence, .is_ambiguous, .ambiguities, .candidates, .warnings
BatchResult .ok, .results, .detected_format, .dates, .iso_strings, .failed, .succeeded, .success_ratio
DateResult .ok, .original, .parsed, .date, .iso, .confidence, .warnings, .row_index

Confidence Levels

Level Meaning
HIGH Unambiguous parse, format is certain
MEDIUM Likely correct, minor ambiguity (e.g. two-digit year)
LOW Ambiguous — DD/MM vs MM/DD unresolved, or poor match ratio
FAILED Could not parse or detect

Design

  • Batch-first: Designed for columns of data, not single strings
  • No silent guessing: Ambiguity is reported, not hidden
  • Format lock-in: Once detected, the format is enforced — violations are flagged
  • Structured results: Every parse returns confidence scores and warnings
  • Zero dependencies: Pure Python, stdlib only

Built for LLMs

datemonkey is designed to work well as a tool for large language models. Date parsing is a common source of silent errors in LLM-driven data pipelines — ambiguous formats lead to wrong guesses, wasted tokens on retries, and broken downstream logic. datemonkey reduces that complexity: a single call returns a structured result with the detected format, confidence level, and any ambiguities — no multi-step prompting or validation loops required. Fewer tokens in, reliable answers out.

Changelog

See CHANGELOG.md for release history.

Development & review

datemonkey is hardened with a competitive multi-model review methodology. The self-contained kit lives in review-kit/:

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datemonkey-0.2.0.tar.gz (28.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datemonkey-0.2.0-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file datemonkey-0.2.0.tar.gz.

File metadata

  • Download URL: datemonkey-0.2.0.tar.gz
  • Upload date:
  • Size: 28.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for datemonkey-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bbc983e9d3290e80eeede4346fdc434f047c7c350c5e9380ef6c6a60bd1c9589
MD5 406cd073fe1876e48ac569fd89388a8d
BLAKE2b-256 242aa2ed8ddb27ea428a99132643bde19790e2020b72ccef0842dc7ebaedc802

See more details on using hashes here.

File details

Details for the file datemonkey-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: datemonkey-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for datemonkey-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 63f903da3d061a851688cf9abf7c6829de04c62f7ef7944b6203fb3ccc3a6195
MD5 746dc0f08a59889cf2b2e969abe089e1
BLAKE2b-256 2de44e3ec5c9c33b925347ba020426c9ecdbd374e71cbfdc1df65969659b9c97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page