Skip to main content

Batch date parsing with ambiguity detection, confidence scores, and format lock-in.

Project description

datemonkey

Batch date parsing with ambiguity detection, confidence scores, and format lock-in.

The problem: dateutil.parser.parse("01/02/03") silently guesses and is often wrong. DD/MM vs MM/DD ambiguity corrupts joins, aggregations, and reports. datemonkey detects ambiguity and tells you about it instead of guessing.

Install

pip install datemonkey

Quick Start

Detect format from a column of values

from datemonkey import detect_format

result = detect_format(["15/03/2024", "20/04/2024", "25/12/2024"])
print(result.format.label)      # "European date (DD/MM/YYYY)"
print(result.confidence)         # Confidence.HIGH
print(result.is_ambiguous)       # False — day > 12 resolves it

Ambiguity detection

result = detect_format(["01/02/2024", "03/04/2024", "05/06/2024"])
print(result.is_ambiguous)       # True
print(result.ambiguities)        # [AmbiguityType.DAY_MONTH_SWAP]
print(result.warnings)
# ["Ambiguous: cannot distinguish US date (MM/DD/YYYY) from European date (DD/MM/YYYY) ..."]

Resolve ambiguity with locale preference

result = detect_format(["01/02/2024", "03/04/2024"], locale_preference="eu")
print(result.format.label)       # "European date (DD/MM/YYYY)"

Parse a batch of dates

from datemonkey import parse_dates

batch = parse_dates(["2024-03-15", "2024-04-20", "2024-12-25"])
print(batch.ok)                  # True
print(batch.dates)               # [datetime(2024,3,15), datetime(2024,4,20), datetime(2024,12,25)]
print(batch.iso_strings)         # ["2024-03-15T00:00:00", ...]

Format lock-in

from datemonkey import parse_dates, ISO_8601

batch = parse_dates(["2024-03-15", "03/15/2024"], format=ISO_8601)
print(batch.results[0].ok)       # True  — matches ISO
print(batch.results[1].ok)       # False — doesn't match, flagged not re-guessed

Strict mode

batch = parse_dates(["01/02/2024", "03/04/2024"], strict=True)
print(batch.parsed_count)        # 0 — refuses to parse ambiguous data
print(batch.warnings)            # ["Strict mode: refusing to parse due to DD/MM vs MM/DD ambiguity..."]

Excel serial dates

from datemonkey import parse_dates, excel_serial_to_datetime

# Single value
dt = excel_serial_to_datetime(45292)  # datetime(2024, 1, 1)

# Batch — auto-detected
batch = parse_dates(["45292", "45293", "45294"])
print(batch.detected_format.label)  # "Excel serial date number"

Per-value results

batch = parse_dates(["2024-03-15", "garbage", "2024-12-25"], format="%Y-%m-%d")
for r in batch.results:
    print(f"{r.original:20s} ok={r.ok}  parsed={r.iso}  warnings={r.warnings}")
# 2024-03-15           ok=True   parsed=2024-03-15T00:00:00  warnings=[]
# garbage              ok=False  parsed=None                  warnings=[...]
# 2024-12-25           ok=True   parsed=2024-12-25T00:00:00  warnings=[]

CLI

# Detect format
datemonkey detect "15/03/2024" "20/04/2024" "25/12/2024"

# Detect with JSON output
datemonkey detect --json "01/02/2024" "03/04/2024"

# Parse dates
datemonkey parse "2024-03-15" "2024-04-20"

# Parse from CSV file (column 2, skip header)
datemonkey parse --file data.csv --column 2 --skip-header

# Parse with explicit format
datemonkey parse --format "%d-%m-%Y" "15-03-2024"

# Parse in strict mode
datemonkey parse --strict "01/02/2024" "03/04/2024"

# List known formats
datemonkey formats

API Reference

detect_format(values, *, locale_preference=None, formats=None) -> FormatDetectionResult

Analyze a batch and determine the most likely format, reporting ambiguity.

  • values: List of date-like values (strings, ints, floats, None)
  • locale_preference: "us" for MM/DD, "eu" for DD/MM (only used when data alone can't resolve)
  • formats: Custom list of DateFormat objects to test

parse_dates(values, *, format=None, locale_preference=None, strict=False) -> BatchResult

Parse a batch with format lock-in.

  • format: A DateFormat object or strftime string. If None, auto-detected.
  • strict: If True, refuse to parse when DD/MM vs MM/DD is ambiguous.

excel_serial_to_datetime(serial) -> datetime | None

Convert an Excel serial date number to a Python datetime.

Result Objects

Object Key Properties
FormatDetectionResult .format, .confidence, .is_ambiguous, .ambiguities, .candidates, .warnings
BatchResult .ok, .results, .detected_format, .dates, .iso_strings, .failed, .succeeded, .success_ratio
DateResult .ok, .original, .parsed, .date, .iso, .confidence, .warnings, .row_index

Confidence Levels

Level Meaning
HIGH Unambiguous parse, format is certain
MEDIUM Likely correct, minor ambiguity (e.g. two-digit year)
LOW Ambiguous — DD/MM vs MM/DD unresolved, or poor match ratio
FAILED Could not parse or detect

Design

  • Batch-first: Designed for columns of data, not single strings
  • No silent guessing: Ambiguity is reported, not hidden
  • Format lock-in: Once detected, the format is enforced — violations are flagged
  • Structured results: Every parse returns confidence scores and warnings
  • Zero dependencies: Pure Python, stdlib only

Built for LLMs

datemonkey is designed to work well as a tool for large language models. Date parsing is a common source of silent errors in LLM-driven data pipelines — ambiguous formats lead to wrong guesses, wasted tokens on retries, and broken downstream logic. datemonkey reduces that complexity: a single call returns a structured result with the detected format, confidence level, and any ambiguities — no multi-step prompting or validation loops required. Fewer tokens in, reliable answers out.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datemonkey-0.1.0.tar.gz (19.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datemonkey-0.1.0-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file datemonkey-0.1.0.tar.gz.

File metadata

  • Download URL: datemonkey-0.1.0.tar.gz
  • Upload date:
  • Size: 19.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for datemonkey-0.1.0.tar.gz
Algorithm Hash digest
SHA256 888c097ed0666ac5142906b183e397985ab656d19ee6bb18287ce6bbf8c3e7b6
MD5 4d562f5c3d0d5973465346cfd2059354
BLAKE2b-256 aaaa2e01df00997adad85303b9f10ed51b54e74ea06af0bec2b512b7b5af58a1

See more details on using hashes here.

File details

Details for the file datemonkey-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: datemonkey-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for datemonkey-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 80fc71fbc43460faabd7df635318182d05cd88329d35ccf5de0bcac4944bca54
MD5 7decda73b8654a33a4853c755b59403c
BLAKE2b-256 7d0d9e4abb1ae5eaf8a72d5d40cf465d73d4d9649259d1f72b92f4bd2e43dfa2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page