Skip to main content

Infer timestamp formats from string columns using CSP constraint elimination

Project description

infer-ts

Infer timestamp formats from string columns using CSP constraint elimination, built in Rust via PyO3 for use with Python and Polars.

Why this library?

Polars can parse string columns to Datetime via str.to_datetime(format=...), but requires you to supply the format string. When no format is given (format=None), Polars infers it — but with two significant limitations:

Only ISO 8601 variants are reliably inferred. Common real-world formats fail entirely with format=None:

pl.Series(["01/15/2024"]).str.to_datetime()
# ComputeError: could not find an appropriate format to parse dates,
# please define a format

pl.Series(["1705312200"]).str.to_datetime()   # Unix timestamps
pl.Series(["Jan 15, 2024"]).str.to_datetime() # Month-name dates
pl.Series(["20240115T103000"]).str.to_datetime() # Compact dates
# All raise the same error

Format is inferred from the first non-null value only. For slash dates, Polars always assumes day-first (%d/%m/%Y). If your data is US-formatted (%m/%d/%Y), you either get silently wrong dates or a parse error on the first value where day > 12:

# Polars locks on %d/%m/%Y from the first row, then fails on month=15
pl.Series(["03/04/2024", "01/15/2024"]).str.to_datetime()
# ComputeError: … failed for 1 out of 2 values: ["01/15/2024"]

The alternative — wrapping str.to_datetime in a try/except loop over candidate formats — is verbose, slow, and still only tells you the first format that works on the first value.

infer-ts solves this by scanning the entire column once with a CSP algorithm that tracks all compatible formats simultaneously. It returns every format consistent with the full column, handles ambiguity explicitly (e.g. reporting both %d/%m/%Y and %m/%d/%Y when the data doesn't distinguish them), and supports dozens of formats that Polars cannot auto-infer.

How it works

The inference engine treats each candidate timestamp format as a variable in a Constraint Satisfaction Problem (CSP). Each cell value in the column acts as a constraint that narrows the candidate set:

  1. Initialise – start with all format combinations as candidates.
  2. Propagate – for each non-null cell, eliminate every format that cannot parse that value.
  3. Early exit (default) – as soon as a single format remains, return it immediately. Set exhaustive=True to disable this and check all values.
  4. Return – return all formats that survived constraint propagation.

This approach is both efficient (resolves in the first few rows for most real data) and flexible (use exhaustive=True to validate the entire column).

Supported formats

See FORMATS.md for the full reference tables (date patterns, time patterns, timezone suffixes, and Unix epoch formats with their Polars format strings). That file is auto-generated from the Rust source — to regenerate after changing any format definitions:

cargo test -- --ignored dump_formats

Architecture: Compositional format design

Internally, formats are represented using a compositional structure rather than a flat enum:

Format
├── Date { date: DateFmt }
├── DateTime { date: DateFmt, sep: Separator, time: TimeFmt, tz: Option<Timezone>, spaced_tz: bool }
└── Unix { precision: UnixPrecision }

All combinations of the above components are tried automatically. Structurally invalid ones (e.g. a value whose first character isn't T or space at the separator position) are eliminated by the parser on the first value, so the performance cost is negligible.

Adding a new format variant (e.g. named timezones) requires a single new enum variant and a validator — no changes to the combinatorial logic.

Installation

Requires Rust and maturin.

pip install maturin
maturin develop --release

For development (uses uv for reproducible installs):

uv sync --all-extras
maturin develop --release
bash build-and-test.sh

Usage

Quick start

import polars as pl
import infer_ts

df = pl.DataFrame({"ts": ["2024-01-15T10:30:00", "2024-06-20T08:00:00"]})

# Series → Series
series = infer_ts.to_datetime(df["ts"])

# Column name → Expr  (works inside lazy frames too)
df = df.with_columns(infer_ts.to_datetime("ts"))

# Expr → Expr
df = df.with_columns(infer_ts.to_datetime(pl.col("ts")))

# Namespace style (equivalent to the Expr form above)
df = df.with_columns(pl.col("ts").infer_ts.to_datetime())

# Control the output time unit (default: "us")
df = df.with_columns(infer_ts.to_datetime("ts", time_unit="ns"))

# Infer the format strings only (returns a list of all matching formats)
fmts = infer_ts.infer_format(df["ts"])

Basic inference

import infer_ts

# Returns a list of compatible formats
fmts = infer_ts.infer_format([
    "2024-01-15T10:30:00",
    "2024-06-20T08:00:00",
    None,                       # nulls are skipped
])
print(fmts)       # ["%Y-%m-%dT%H:%M:%S"]
print(fmts[0])    # "%Y-%m-%dT%H:%M:%S"

# Use exhaustive=True to process all values
fmts = infer_ts.infer_format(["01/02/2024", "03/04/2024"], exhaustive=True)
print(fmts)  # ["%d/%m/%Y", "%m/%d/%Y"] - both US and EU formats match

Polars – using the inferred format directly

If you need the format string itself (e.g. to pass to other tools), use infer_format and call str.to_datetime manually:

import polars as pl
import infer_ts

df = pl.DataFrame({"ts": ["2024-01-15T10:30:00", "2024-06-20T08:00:00"]})

fmts = infer_ts.infer_format(df["ts"])
# Use first format (or handle multiple if ambiguous)
fmt = fmts[0]
df = df.with_columns(pl.col("ts").str.to_datetime(format=fmt))

Polars – Unix epoch formats

infer_format returns a @-prefixed marker for epoch columns (e.g. @unix_seconds, @unix_ms, @unix_us, @unix_ns). to_datetime handles these automatically with correct integer scaling — no manual casting needed:

import polars as pl
import infer_ts

df = pl.DataFrame({"ts": ["1705312200", "1705398600"]})

df = df.with_columns(infer_ts.to_datetime("ts"))       # Expr form
result = infer_ts.to_datetime(df["ts"])                # Series form

Handling ambiguity

US and EU slash dates are inherently ambiguous when every day value is ≤ 12. Instead of raising an error, the library returns all compatible formats:

import infer_ts

# Ambiguous – both mm/dd and dd/mm interpretations are valid for every row
fmts = infer_ts.infer_format(["01/02/2024", "03/04/2024"])
print(fmts)       # ["%d/%m/%Y", "%m/%d/%Y"]
print(len(fmts))  # 2 - caller can choose or prompt user

# Resolved – day 15 > 12 eliminates the US interpretation
fmts = infer_ts.infer_format(["01/02/2024", "15/03/2024"])
print(fmts)  # ["%d/%m/%Y"] - uniquely EU

# No match returns empty list
fmts = infer_ts.infer_format(["not a timestamp"])
print(fmts)  # []

Contributing

This project was developed with heavy use of LLM agents (primarily Claude) for both the initial implementation and subsequent refinement. If you spot a bug, an edge case the parser mishandles, or a timestamp format that should be supported, issues and pull requests are very welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

infer_ts-0.1.2.tar.gz (82.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

infer_ts-0.1.2-cp310-abi3-win_amd64.whl (5.4 MB view details)

Uploaded CPython 3.10+Windows x86-64

infer_ts-0.1.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.0 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

infer_ts-0.1.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.6 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

infer_ts-0.1.2-cp310-abi3-macosx_11_0_arm64.whl (4.4 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

infer_ts-0.1.2-cp310-abi3-macosx_10_12_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file infer_ts-0.1.2.tar.gz.

File metadata

  • Download URL: infer_ts-0.1.2.tar.gz
  • Upload date:
  • Size: 82.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for infer_ts-0.1.2.tar.gz
Algorithm Hash digest
SHA256 467e55d0ac17dbb8f23feb6dfe7bb70e4cc5116ee92568b9a6f29804f523e8d1
MD5 d4acf5cefef2a1b2276bfde131036118
BLAKE2b-256 4d819e4226ed432e7ed372dd31a8bf9348063e1f6ba40a3eb5bcefc5e9ebb995

See more details on using hashes here.

Provenance

The following attestation bundles were made for infer_ts-0.1.2.tar.gz:

Publisher: release.yml on andreasoprani/infer-ts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file infer_ts-0.1.2-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: infer_ts-0.1.2-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 5.4 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for infer_ts-0.1.2-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 684dd9c5901da4c96620456caede1f74e23d8612d3d3f5d16a2c137d2f3856e7
MD5 c2f8537eb4c6a04d29ef2586ac944888
BLAKE2b-256 eb96dbf8cf5e4fbb97d7faaf24084021cff540561a9749ee24d54fb5734640ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for infer_ts-0.1.2-cp310-abi3-win_amd64.whl:

Publisher: release.yml on andreasoprani/infer-ts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file infer_ts-0.1.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for infer_ts-0.1.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9db824d1c8c5aace22b436f05af4e84af08e6730a21339b11d544ed1d33c11ed
MD5 00c495798724570834b5eb52d5b52f72
BLAKE2b-256 63c3e946e5f250f74e5f1dab535ea1108f2a9501d081844279c553df0971e360

See more details on using hashes here.

Provenance

The following attestation bundles were made for infer_ts-0.1.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on andreasoprani/infer-ts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file infer_ts-0.1.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for infer_ts-0.1.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9af05026b434fcdf83db33b0ccb9d0537cb0e920db3ace37a6774b3f89588cc2
MD5 c353ae16df3fda92df2abd41d2c4171c
BLAKE2b-256 555ea473d662813697279d2679e12549d6a2d1edd8f755745208a1f02dc120d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for infer_ts-0.1.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on andreasoprani/infer-ts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file infer_ts-0.1.2-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for infer_ts-0.1.2-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c374dae264a54e1a23de53e892541747562ba97f186cff85838ee32d77a6c070
MD5 15df08acef9d6da35508ed20768bffda
BLAKE2b-256 8b1dd763573dac3d52c377db9ce1d554459e48a00017609fc6fce630ef5e596e

See more details on using hashes here.

Provenance

The following attestation bundles were made for infer_ts-0.1.2-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on andreasoprani/infer-ts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file infer_ts-0.1.2-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for infer_ts-0.1.2-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 5707efd239bec53c42e322e20073de84dbe555271488032bdbe9fbce87e08347
MD5 402c4d90366e19f858d88331427c6e61
BLAKE2b-256 c83d3efa5ef6972d850d862cf98c77fbe7296f6052a5e029d8aac3a3d61c920f

See more details on using hashes here.

Provenance

The following attestation bundles were made for infer_ts-0.1.2-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on andreasoprani/infer-ts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page