Skip to main content

Infer timestamp formats from string columns using CSP constraint elimination

Project description

infer-ts

Infer timestamp formats from string columns using CSP constraint elimination, built in Rust via PyO3 for use with Python and Polars.

Why this library?

Polars can parse string columns to Datetime via str.to_datetime(format=...), but requires you to supply the format string. When no format is given (format=None), Polars infers it — but with two significant limitations:

Only ISO 8601 variants are reliably inferred. Common real-world formats fail entirely with format=None:

pl.Series(["01/15/2024"]).str.to_datetime()
# ComputeError: could not find an appropriate format to parse dates,
# please define a format

pl.Series(["1705312200"]).str.to_datetime()   # Unix timestamps
pl.Series(["Jan 15, 2024"]).str.to_datetime() # Month-name dates
pl.Series(["20240115T103000"]).str.to_datetime() # Compact dates
# All raise the same error

Format is inferred from the first non-null value only. For slash dates, Polars always assumes day-first (%d/%m/%Y). If your data is US-formatted (%m/%d/%Y), you either get silently wrong dates or a parse error on the first value where day > 12:

# Polars locks on %d/%m/%Y from the first row, then fails on month=15
pl.Series(["03/04/2024", "01/15/2024"]).str.to_datetime()
# ComputeError: … failed for 1 out of 2 values: ["01/15/2024"]

The alternative — wrapping str.to_datetime in a try/except loop over candidate formats — is verbose, slow, and still only tells you the first format that works on the first value.

infer-ts solves this by scanning the entire column once with a CSP algorithm that tracks all compatible formats simultaneously. It returns every format consistent with the full column, handles ambiguity explicitly (e.g. reporting both %d/%m/%Y and %m/%d/%Y when the data doesn't distinguish them), and supports dozens of formats that Polars cannot auto-infer.

How it works

The inference engine treats each candidate timestamp format as a variable in a Constraint Satisfaction Problem (CSP). Each cell value in the column acts as a constraint that narrows the candidate set:

  1. Initialise – start with all format combinations as candidates.
  2. Propagate – for each non-null cell, eliminate every format that cannot parse that value.
  3. Early exit (default) – as soon as a single format remains, return it immediately. Set exhaustive=True to disable this and check all values.
  4. Return – return all formats that survived constraint propagation.

This approach is both efficient (resolves in the first few rows for most real data) and flexible (use exhaustive=True to validate the entire column).

Supported formats

See FORMATS.md for the full reference tables (date patterns, time patterns, timezone suffixes, and Unix epoch formats with their Polars format strings). That file is auto-generated from the Rust source — to regenerate after changing any format definitions:

cargo test -- --ignored dump_formats

Architecture: Compositional format design

Internally, formats are represented using a compositional structure rather than a flat enum:

Format
├── Date { date: DateFmt }
├── DateTime { date: DateFmt, sep: Separator, time: TimeFmt, tz: Option<Timezone>, spaced_tz: bool }
└── Unix { precision: UnixPrecision }

All combinations of the above components are tried automatically. Structurally invalid ones (e.g. a value whose first character isn't T or space at the separator position) are eliminated by the parser on the first value, so the performance cost is negligible.

Adding a new format variant (e.g. named timezones) requires a single new enum variant and a validator — no changes to the combinatorial logic.

Installation

Requires Rust and maturin.

pip install maturin
maturin develop --release

For development (uses uv for reproducible installs):

uv sync --all-extras
maturin develop --release
bash build-and-test.sh

Usage

Quick start

import polars as pl
import infer_ts

df = pl.DataFrame({"ts": ["2024-01-15T10:30:00", "2024-06-20T08:00:00"]})

# One-liner: infer format and cast to Datetime in a single call
series = infer_ts.to_datetime(df["ts"])

# Or as a Polars expression (works inside lazy frames too)
df = df.with_columns(pl.col("ts").infer_ts.to_datetime())

# Control the output time unit (default: "us")
df = df.with_columns(pl.col("ts").infer_ts.to_datetime(time_unit="ns"))

# Infer the format strings only (returns a list of all matching formats)
fmts = infer_ts.infer_format(df["ts"])

Basic inference

import infer_ts

# Returns a list of compatible formats
fmts = infer_ts.infer_format([
    "2024-01-15T10:30:00",
    "2024-06-20T08:00:00",
    None,                       # nulls are skipped
])
print(fmts)       # ["%Y-%m-%dT%H:%M:%S"]
print(fmts[0])    # "%Y-%m-%dT%H:%M:%S"

# Use exhaustive=True to process all values
fmts = infer_ts.infer_format(["01/02/2024", "03/04/2024"], exhaustive=True)
print(fmts)  # ["%d/%m/%Y", "%m/%d/%Y"] - both US and EU formats match

Polars – string-based formats

import polars as pl
import infer_ts

df = pl.DataFrame({"ts": ["2024-01-15T10:30:00", "2024-06-20T08:00:00"]})

fmts = infer_ts.infer_format(df["ts"])
# Use first format (or handle multiple if ambiguous)
fmt = fmts[0]
df = df.with_columns(pl.col("ts").str.to_datetime(format=fmt))

Polars – Unix epoch formats

infer_format returns a @-prefixed marker for epoch columns (e.g. @unix_seconds, @unix_ms, @unix_us, @unix_ns). The to_datetime plugin and to_datetime() wrapper handle these automatically with correct integer scaling — no manual casting needed:

import polars as pl
import infer_ts

df = pl.DataFrame({"ts": ["1705312200", "1705398600"]})

# Plugin expression (works in lazy frames too):
df = df.with_columns(pl.col("ts").infer_ts.to_datetime())

# Or via the series API:
result = infer_ts.to_datetime(df["ts"])

Handling ambiguity

US and EU slash dates are inherently ambiguous when every day value is ≤ 12. Instead of raising an error, the library returns all compatible formats:

import infer_ts

# Ambiguous – both mm/dd and dd/mm interpretations are valid for every row
fmts = infer_ts.infer_format(["01/02/2024", "03/04/2024"])
print(fmts)       # ["%d/%m/%Y", "%m/%d/%Y"]
print(len(fmts))  # 2 - caller can choose or prompt user

# Resolved – day 15 > 12 eliminates the US interpretation
fmts = infer_ts.infer_format(["01/02/2024", "15/03/2024"])
print(fmts)  # ["%d/%m/%Y"] - uniquely EU

# No match returns empty list
fmts = infer_ts.infer_format(["not a timestamp"])
print(fmts)  # []

Contributing

This project was developed with heavy use of LLM agents (primarily Claude) for both the initial implementation and subsequent refinement. If you spot a bug, an edge case the parser mishandles, or a timestamp format that should be supported, issues and pull requests are very welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

infer_ts-0.1.1.tar.gz (85.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

infer_ts-0.1.1-cp39-abi3-win_amd64.whl (5.3 MB view details)

Uploaded CPython 3.9+Windows x86-64

infer_ts-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.0 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

infer_ts-0.1.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.6 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

infer_ts-0.1.1-cp39-abi3-macosx_11_0_arm64.whl (4.4 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

infer_ts-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file infer_ts-0.1.1.tar.gz.

File metadata

  • Download URL: infer_ts-0.1.1.tar.gz
  • Upload date:
  • Size: 85.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for infer_ts-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b0bc342999f8cca058633b53d7edfaa42509f42390819bb9633f60a2ddaabe1a
MD5 f97fb42f5fe79dc311c34496a8f403bc
BLAKE2b-256 e9fe180a6b20a3a90b1972e95a06b44166c62e698e029719189a7852bd5ed94b

See more details on using hashes here.

Provenance

The following attestation bundles were made for infer_ts-0.1.1.tar.gz:

Publisher: release.yml on andreasoprani/infer-ts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file infer_ts-0.1.1-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: infer_ts-0.1.1-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 5.3 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for infer_ts-0.1.1-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 182c623e2b56c48f66de32f1503650827ac931d6844fede03a0cb9207690a3bc
MD5 f5ed2446a02d5a25eea732760bd4d7c9
BLAKE2b-256 483d1ccdab63c3b495fed44638e7bf53a1ae39f13c14475ca2b54f0eead69714

See more details on using hashes here.

Provenance

The following attestation bundles were made for infer_ts-0.1.1-cp39-abi3-win_amd64.whl:

Publisher: release.yml on andreasoprani/infer-ts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file infer_ts-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for infer_ts-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6af55a97d2ecdf253c9c9a9cb71c7733bd092d97a49b9f830d7033aa6084cdda
MD5 5c447d8547d1d60ac951ba8ecd2f1a01
BLAKE2b-256 74e7d66e7b5633af4e028e3d79f0482ed33d94a860214b9a81348fac7978db91

See more details on using hashes here.

Provenance

The following attestation bundles were made for infer_ts-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on andreasoprani/infer-ts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file infer_ts-0.1.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for infer_ts-0.1.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 1d360a3d169926f63b3d4d5c6346027a7fc502ee15ccd248c9f4d0f9622d1dd6
MD5 3e78897373ed27774e2358d279ad329a
BLAKE2b-256 44b18527d378de60bf624d70ef498731138d51e9c886c1edc8c4f5797dc80a34

See more details on using hashes here.

Provenance

The following attestation bundles were made for infer_ts-0.1.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on andreasoprani/infer-ts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file infer_ts-0.1.1-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for infer_ts-0.1.1-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 68b3f3839807f4b302b0a6c4341691dd71db0ad510f9d9b02ed3e1ef4ad04f5b
MD5 cc247f4ed7a98524f324cc92a186eccd
BLAKE2b-256 d8f604c7fb5456a9ec6158912e2568efeba28be88a29da41c091389558384da5

See more details on using hashes here.

Provenance

The following attestation bundles were made for infer_ts-0.1.1-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on andreasoprani/infer-ts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file infer_ts-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for infer_ts-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ec8ad7e3ccd974438662c11dc11a15e6e0bd50c77b0a2878263f47ba91b60924
MD5 889070c7a1d27d6dee6ee8ee519740a2
BLAKE2b-256 8cbb5df817e56d9feda6e7d59ddaedecbf63268554884938e08c9bf749d9e4a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for infer_ts-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on andreasoprani/infer-ts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page