Fast, consensus-based date format inference

These details have not been verified by PyPI

Project links

Project description

fastdateinfer

Fast, consensus-based date format inference written in Rust with Python bindings.

Why?

The problem: Is 01/02/2025 January 2nd or February 1st?

Library	Approach	Problem
pandas	`dayfirst=True` hint	You must know the format
dateutil	Guess per-element	Inconsistent results
hidateinfer	Consensus voting	Correct, but slow

The solution: If your data contains 15/03/2025, we know it's DD/MM/YYYY (15 can't be a month). This insight applies to ALL dates, resolving ambiguous ones like 01/02/2025.

fastdateinfer implements this consensus algorithm in Rust — 270x faster than hidateinfer.

Installation

pip install fastdateinfer

Quick Start

import fastdateinfer

# Infer format from dates
result = fastdateinfer.infer(["15/03/2025", "01/02/2025", "28/12/2025"])
print(result.format)      # %d/%m/%Y
print(result.confidence)  # 1.0

# Just get the format string
fmt = fastdateinfer.infer_format(["2025-01-15", "2025-03-20"])
print(fmt)  # %Y-%m-%d

# Use with pandas
import pandas as pd
dates = ["15/03/2025", "01/02/2025", "28/12/2025"]
fmt = fastdateinfer.infer_format(dates)
df = pd.to_datetime(dates, format=fmt)

Benchmarks

vs hidateinfer (Python)

Tested on 29,351 real-world dates across multiple formats:

Library	Time	Speedup
fastdateinfer	22.5 ms	—
hidateinfer	6,075 ms	270x slower

vs pandas / polars

Comparison on synthetic data (DD/MM/YYYY format):

Dates	fastdateinfer	pandas (explicit)	pandas (mixed)	Ratio
100	0.05 ms	0.24 ms	0.25 ms	5x faster
1,000	0.48 ms	0.97 ms	1.02 ms	2x faster
10,000	0.74 ms	2.14 ms	2.20 ms	3x faster
100,000	3.39 ms	17.00 ms	17.50 ms	5x faster

Note: fastdateinfer does format inference while pandas just parses a known format. Yet fastdateinfer is faster because it samples intelligently (consensus converges with ~1000 dates).

Scaling

Dates	Time	Per-date
1,000	0.48 ms	0.48 µs
10,000	0.74 ms	0.07 µs
100,000	3.39 ms	0.03 µs
1,000,000	~35 ms	0.03 µs

Performance is sublinear due to smart sampling — only ~1000 dates are fully analyzed regardless of input size.

Supported Formats

Format	Example	Output
European	`15/03/2025`	`%d/%m/%Y`
American	`03/15/2025`	`%m/%d/%Y`
ISO 8601	`2025-03-15`	`%Y-%m-%d`
ISO datetime	`2025-03-15T10:30:00`	`%Y-%m-%dT%H:%M:%S`
Month name	`15 Mar 2025`	`%d %b %Y`
Month name (full)	`15 March 2025`	`%d %B %Y`
Month first	`Mar 15, 2025`	`%b %d, %Y`
2-digit year	`15/03/25`	`%d/%m/%y`
With time	`15/03/25 10.30.00`	`%d/%m/%y %H.%M.%S`
Month-year only	`March, 2025`	`%B, %Y`
Day-month only	`15/Mar`	`%d/%b`

API Reference

`infer(dates, prefer_dayfirst=True, min_confidence=0.0, strict=False)`

Infer date format from a list of date strings.

Arguments:

dates: List of date strings
prefer_dayfirst: Use DD/MM for fully ambiguous dates (default: True)
min_confidence: Minimum confidence threshold (default: 0.0)
strict: Raise error if any date doesn't match (default: False)

Returns: InferResult with:

format: strptime format string
confidence: float between 0.0 and 1.0
token_types: list of resolved token types

result = fastdateinfer.infer(["01/02/2025", "03/04/2025"], prefer_dayfirst=False)
print(result.format)  # %m/%d/%Y (American format)

`infer_format(dates, prefer_dayfirst=True)`

Convenience function that returns only the format string.

fmt = fastdateinfer.infer_format(["2025-01-15", "2025-03-20"])
print(fmt)  # %Y-%m-%d

`infer_batch(columns, prefer_dayfirst=True)`

Infer formats for multiple columns at once.

results = fastdateinfer.infer_batch({
    "transaction_date": ["15/03/2025", "01/02/2025"],
    "created_at": ["2025-01-15T10:30:00", "2025-01-16T14:45:00"],
    "value_date": ["15-Mar-2025", "01-Feb-2025"]
})

for col, result in results.items():
    print(f"{col}: {result.format}")
# transaction_date: %d/%m/%Y
# created_at: %Y-%m-%dT%H:%M:%S
# value_date: %d-%b-%Y

How It Works

Tokenize: Split "15/03/2025" into [15, /, 03, /, 2025]
Constrain: 15 can only be Day (>12), 03 could be Day or Month, 2025 is Year
Vote: Across all dates, count evidence for each position
Resolve: Position 1 has strong Day evidence → Position 2 must be Month
Format: Output %d/%m/%Y

The key insight: consensus converges quickly. Even with 1 million dates, we only need to analyze ~1000 to determine the format with high confidence.

Use Cases

CSV/Data Processing

import pandas as pd
import fastdateinfer

# Read raw data
df = pd.read_csv("data.csv")

# Detect format automatically
fmt = fastdateinfer.infer_format(df["date"].dropna().tolist())

# Parse with detected format
df["date"] = pd.to_datetime(df["date"], format=fmt)

Multi-format Data Pipeline

# Different columns may have different formats
results = fastdateinfer.infer_batch({
    col: df[col].dropna().astype(str).tolist()
    for col in ["date", "value_date", "created_at"]
})

for col, result in results.items():
    df[col] = pd.to_datetime(df[col], format=result.format)

Validation

# Ensure high confidence
result = fastdateinfer.infer(dates, min_confidence=0.9)
if result.confidence < 0.9:
    raise ValueError(f"Low confidence: {result.confidence}")

Comparison

Feature	fastdateinfer	hidateinfer	pandas	dateutil
Consensus-based	✅	✅	❌	❌
Speed (10k dates)	0.74 ms	200 ms	2 ms*	N/A
Returns strptime format	✅	✅	❌	❌
Batch inference	✅	❌	❌	❌
Type hints	✅	❌	✅	✅
Pure Rust core	✅	❌	❌	❌

*pandas time is for parsing only (you must already know the format)

Building from Source

# Prerequisites
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
pip install maturin

# Clone and build
git clone https://github.com/coledrain/fastdateinfer
cd fastdateinfer
maturin develop --release

# Run tests
cargo test

License

MIT License. See LICENSE for details.

Contributing

Contributions welcome! Please open an issue or PR on GitHub.

Acknowledgments

Inspired by hidateinfer
Built with PyO3 for Python bindings
Built for high-volume data processing pipelines

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.6

Feb 6, 2026

0.1.5

Feb 6, 2026

0.1.4

Feb 6, 2026

0.1.3 yanked

Feb 5, 2026

Reason this release was yanked:

Superseded by 0.1.4

0.1.2 yanked

Feb 5, 2026

Reason this release was yanked:

Superseded by 0.1.4

This version

0.1.0 yanked

Feb 5, 2026

Reason this release was yanked:

Superseded by 0.1.4

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastdateinfer-0.1.0.tar.gz (24.0 kB view details)

Uploaded Feb 5, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fastdateinfer-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (280.8 kB view details)

Uploaded Feb 5, 2026 CPython 3.12macOS 11.0+ ARM64

fastdateinfer-0.1.0-cp39-cp39-macosx_11_0_arm64.whl (245.0 kB view details)

Uploaded Feb 5, 2026 CPython 3.9macOS 11.0+ ARM64

File details

Details for the file fastdateinfer-0.1.0.tar.gz.

File metadata

Download URL: fastdateinfer-0.1.0.tar.gz
Upload date: Feb 5, 2026
Size: 24.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.5

File hashes

Hashes for fastdateinfer-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d0905b04fde9a5455e39358f664e6d5c6e5fb82faabf9110ecdb91e5acc2a169`
MD5	`45a56354d7aace6c9affd4fd6101d5f1`
BLAKE2b-256	`5600bd43bb4520c0171ab8fe8f4b5df5530d6cf3856b811289c9183c443d9a3e`

See more details on using hashes here.

File details

Details for the file fastdateinfer-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

Download URL: fastdateinfer-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Upload date: Feb 5, 2026
Size: 280.8 kB
Tags: CPython 3.12, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for fastdateinfer-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`cb1de0321d8c1171915bca085591cbea3b53fbb0097d4f767925f5fbeac52110`
MD5	`72860f7f46dcfa6474f9e2113090f014`
BLAKE2b-256	`b998d5b1dcaed51ed51ffb842d0ac93c8a7b4b704abf34c1fc474fde0712c143`

See more details on using hashes here.

File details

Details for the file fastdateinfer-0.1.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

Download URL: fastdateinfer-0.1.0-cp39-cp39-macosx_11_0_arm64.whl
Upload date: Feb 5, 2026
Size: 245.0 kB
Tags: CPython 3.9, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.5

File hashes

Hashes for fastdateinfer-0.1.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`2db7524d3a6ebe5ce0cbc0362f05937ce71cf67a436a47951e7f91206eefcf2d`
MD5	`683527f7a432ec5c4338d9cd4f215c85`
BLAKE2b-256	`8c4f24e6b3f173ecbbc9cc685605f7e1a9eedd3a2ecca3370c61756ce49c7f83`

See more details on using hashes here.

fastdateinfer 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

fastdateinfer

Why?

Installation

Quick Start

Benchmarks

vs hidateinfer (Python)

vs pandas / polars

Scaling

Supported Formats

API Reference

infer(dates, prefer_dayfirst=True, min_confidence=0.0, strict=False)

infer_format(dates, prefer_dayfirst=True)

infer_batch(columns, prefer_dayfirst=True)

How It Works

Use Cases

CSV/Data Processing

Multi-format Data Pipeline

Validation

Comparison

Building from Source

License

Contributing

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

`infer(dates, prefer_dayfirst=True, min_confidence=0.0, strict=False)`

`infer_format(dates, prefer_dayfirst=True)`

`infer_batch(columns, prefer_dayfirst=True)`