Infer timestamp formats from string columns using CSP constraint elimination
Project description
infer-ts
Infer timestamp formats from string columns using CSP constraint elimination, built in Rust via PyO3 for use with Python and Polars.
Why this library?
Polars can parse string columns to Datetime via str.to_datetime(format=...), but requires you to supply the format string. When no format is given (format=None), Polars infers it — but with two significant limitations:
Only ISO 8601 variants are reliably inferred. Common real-world formats fail entirely with format=None:
pl.Series(["01/15/2024"]).str.to_datetime()
# ComputeError: could not find an appropriate format to parse dates,
# please define a format
pl.Series(["1705312200"]).str.to_datetime() # Unix timestamps
pl.Series(["Jan 15, 2024"]).str.to_datetime() # Month-name dates
pl.Series(["20240115T103000"]).str.to_datetime() # Compact dates
# All raise the same error
Format is inferred from the first non-null value only. For slash dates, Polars always assumes day-first (%d/%m/%Y). If your data is US-formatted (%m/%d/%Y), you either get silently wrong dates or a parse error on the first value where day > 12:
# Polars locks on %d/%m/%Y from the first row, then fails on month=15
pl.Series(["03/04/2024", "01/15/2024"]).str.to_datetime()
# ComputeError: … failed for 1 out of 2 values: ["01/15/2024"]
The alternative — wrapping str.to_datetime in a try/except loop over candidate formats — is verbose, slow, and still only tells you the first format that works on the first value.
infer-ts solves this by scanning the entire column once with a CSP algorithm that tracks all compatible formats simultaneously. It returns every format consistent with the full column, handles ambiguity explicitly (e.g. reporting both %d/%m/%Y and %m/%d/%Y when the data doesn't distinguish them), and supports dozens of formats that Polars cannot auto-infer.
How it works
The inference engine treats each candidate timestamp format as a variable in a Constraint Satisfaction Problem (CSP). Each cell value in the column acts as a constraint that narrows the candidate set:
- Initialise – start with all format combinations as candidates.
- Propagate – for each non-null cell, eliminate every format that cannot parse that value.
- Early exit (default) – as soon as a single format remains, return it immediately. Set
exhaustive=Trueto disable this and check all values. - Return – return all formats that survived constraint propagation.
This approach is both efficient (resolves in the first few rows for most real data) and flexible (use exhaustive=True to validate the entire column).
Supported formats
See FORMATS.md for the full reference tables (date patterns, time patterns, timezone suffixes, and Unix epoch formats with their Polars format strings). That file is auto-generated from the Rust source — to regenerate after changing any format definitions:
cargo test -- --ignored dump_formats
Architecture: Compositional format design
Internally, formats are represented using a compositional structure rather than a flat enum:
Format
├── Date { date: DateFmt }
├── DateTime { date: DateFmt, sep: Separator, time: TimeFmt, tz: Option<Timezone>, spaced_tz: bool }
└── Unix { precision: UnixPrecision }
All combinations of the above components are tried automatically. Structurally invalid ones (e.g. a value whose first character isn't T or space at the separator position) are eliminated by the parser on the first value, so the performance cost is negligible.
Adding a new format variant (e.g. named timezones) requires a single new enum variant and a validator — no changes to the combinatorial logic.
Installation
pip install maturin
maturin develop --release
For development (uses uv for reproducible installs):
uv sync --all-extras
maturin develop --release
bash build-and-test.sh
Usage
Quick start
import polars as pl
import infer_ts
df = pl.DataFrame({"ts": ["2024-01-15T10:30:00", "2024-06-20T08:00:00"]})
# Series → Series
series = infer_ts.to_datetime(df["ts"])
# Column name → Expr (works inside lazy frames too)
df = df.with_columns(infer_ts.to_datetime("ts"))
# Expr → Expr
df = df.with_columns(infer_ts.to_datetime(pl.col("ts")))
# Namespace style (equivalent to the Expr form above)
df = df.with_columns(pl.col("ts").infer_ts.to_datetime())
# Control the output time unit (default: "us")
df = df.with_columns(infer_ts.to_datetime("ts", time_unit="ns"))
# Infer the format strings only (returns a list of all matching formats)
fmts = infer_ts.infer_format(df["ts"])
Basic inference
import infer_ts
# Returns a list of compatible formats
fmts = infer_ts.infer_format([
"2024-01-15T10:30:00",
"2024-06-20T08:00:00",
None, # nulls are skipped
])
print(fmts) # ["%Y-%m-%dT%H:%M:%S"]
print(fmts[0]) # "%Y-%m-%dT%H:%M:%S"
# Use exhaustive=True to process all values
fmts = infer_ts.infer_format(["01/02/2024", "03/04/2024"], exhaustive=True)
print(fmts) # ["%d/%m/%Y", "%m/%d/%Y"] - both US and EU formats match
Polars – using the inferred format directly
If you need the format string itself (e.g. to pass to other tools), use infer_format and call str.to_datetime manually:
import polars as pl
import infer_ts
df = pl.DataFrame({"ts": ["2024-01-15T10:30:00", "2024-06-20T08:00:00"]})
fmts = infer_ts.infer_format(df["ts"])
# Use first format (or handle multiple if ambiguous)
fmt = fmts[0]
df = df.with_columns(pl.col("ts").str.to_datetime(format=fmt))
Polars – Unix epoch formats
infer_format returns a @-prefixed marker for epoch columns (e.g. @unix_seconds, @unix_ms, @unix_us, @unix_ns). to_datetime handles these automatically with correct integer scaling — no manual casting needed:
import polars as pl
import infer_ts
df = pl.DataFrame({"ts": ["1705312200", "1705398600"]})
df = df.with_columns(infer_ts.to_datetime("ts")) # Expr form
result = infer_ts.to_datetime(df["ts"]) # Series form
Handling ambiguity
US and EU slash dates are inherently ambiguous when every day value is ≤ 12. Instead of raising an error, the library returns all compatible formats:
import infer_ts
# Ambiguous – both mm/dd and dd/mm interpretations are valid for every row
fmts = infer_ts.infer_format(["01/02/2024", "03/04/2024"])
print(fmts) # ["%d/%m/%Y", "%m/%d/%Y"]
print(len(fmts)) # 2 - caller can choose or prompt user
# Resolved – day 15 > 12 eliminates the US interpretation
fmts = infer_ts.infer_format(["01/02/2024", "15/03/2024"])
print(fmts) # ["%d/%m/%Y"] - uniquely EU
# No match returns empty list
fmts = infer_ts.infer_format(["not a timestamp"])
print(fmts) # []
Contributing
This project was developed with heavy use of LLM agents (primarily Claude) for both the initial implementation and subsequent refinement. If you spot a bug, an edge case the parser mishandles, or a timestamp format that should be supported, issues and pull requests are very welcome.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file infer_ts-0.1.2.tar.gz.
File metadata
- Download URL: infer_ts-0.1.2.tar.gz
- Upload date:
- Size: 82.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
467e55d0ac17dbb8f23feb6dfe7bb70e4cc5116ee92568b9a6f29804f523e8d1
|
|
| MD5 |
d4acf5cefef2a1b2276bfde131036118
|
|
| BLAKE2b-256 |
4d819e4226ed432e7ed372dd31a8bf9348063e1f6ba40a3eb5bcefc5e9ebb995
|
Provenance
The following attestation bundles were made for infer_ts-0.1.2.tar.gz:
Publisher:
release.yml on andreasoprani/infer-ts
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
infer_ts-0.1.2.tar.gz -
Subject digest:
467e55d0ac17dbb8f23feb6dfe7bb70e4cc5116ee92568b9a6f29804f523e8d1 - Sigstore transparency entry: 986281998
- Sigstore integration time:
-
Permalink:
andreasoprani/infer-ts@baa09bc3c90ccc817275d7cf1c1e67b91d84e519 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/andreasoprani
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@baa09bc3c90ccc817275d7cf1c1e67b91d84e519 -
Trigger Event:
push
-
Statement type:
File details
Details for the file infer_ts-0.1.2-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: infer_ts-0.1.2-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 5.4 MB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
684dd9c5901da4c96620456caede1f74e23d8612d3d3f5d16a2c137d2f3856e7
|
|
| MD5 |
c2f8537eb4c6a04d29ef2586ac944888
|
|
| BLAKE2b-256 |
eb96dbf8cf5e4fbb97d7faaf24084021cff540561a9749ee24d54fb5734640ea
|
Provenance
The following attestation bundles were made for infer_ts-0.1.2-cp310-abi3-win_amd64.whl:
Publisher:
release.yml on andreasoprani/infer-ts
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
infer_ts-0.1.2-cp310-abi3-win_amd64.whl -
Subject digest:
684dd9c5901da4c96620456caede1f74e23d8612d3d3f5d16a2c137d2f3856e7 - Sigstore transparency entry: 986282238
- Sigstore integration time:
-
Permalink:
andreasoprani/infer-ts@baa09bc3c90ccc817275d7cf1c1e67b91d84e519 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/andreasoprani
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@baa09bc3c90ccc817275d7cf1c1e67b91d84e519 -
Trigger Event:
push
-
Statement type:
File details
Details for the file infer_ts-0.1.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: infer_ts-0.1.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 5.0 MB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9db824d1c8c5aace22b436f05af4e84af08e6730a21339b11d544ed1d33c11ed
|
|
| MD5 |
00c495798724570834b5eb52d5b52f72
|
|
| BLAKE2b-256 |
63c3e946e5f250f74e5f1dab535ea1108f2a9501d081844279c553df0971e360
|
Provenance
The following attestation bundles were made for infer_ts-0.1.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on andreasoprani/infer-ts
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
infer_ts-0.1.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
9db824d1c8c5aace22b436f05af4e84af08e6730a21339b11d544ed1d33c11ed - Sigstore transparency entry: 986282298
- Sigstore integration time:
-
Permalink:
andreasoprani/infer-ts@baa09bc3c90ccc817275d7cf1c1e67b91d84e519 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/andreasoprani
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@baa09bc3c90ccc817275d7cf1c1e67b91d84e519 -
Trigger Event:
push
-
Statement type:
File details
Details for the file infer_ts-0.1.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: infer_ts-0.1.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 4.6 MB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9af05026b434fcdf83db33b0ccb9d0537cb0e920db3ace37a6774b3f89588cc2
|
|
| MD5 |
c353ae16df3fda92df2abd41d2c4171c
|
|
| BLAKE2b-256 |
555ea473d662813697279d2679e12549d6a2d1edd8f755745208a1f02dc120d2
|
Provenance
The following attestation bundles were made for infer_ts-0.1.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
release.yml on andreasoprani/infer-ts
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
infer_ts-0.1.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
9af05026b434fcdf83db33b0ccb9d0537cb0e920db3ace37a6774b3f89588cc2 - Sigstore transparency entry: 986282061
- Sigstore integration time:
-
Permalink:
andreasoprani/infer-ts@baa09bc3c90ccc817275d7cf1c1e67b91d84e519 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/andreasoprani
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@baa09bc3c90ccc817275d7cf1c1e67b91d84e519 -
Trigger Event:
push
-
Statement type:
File details
Details for the file infer_ts-0.1.2-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: infer_ts-0.1.2-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 4.4 MB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c374dae264a54e1a23de53e892541747562ba97f186cff85838ee32d77a6c070
|
|
| MD5 |
15df08acef9d6da35508ed20768bffda
|
|
| BLAKE2b-256 |
8b1dd763573dac3d52c377db9ce1d554459e48a00017609fc6fce630ef5e596e
|
Provenance
The following attestation bundles were made for infer_ts-0.1.2-cp310-abi3-macosx_11_0_arm64.whl:
Publisher:
release.yml on andreasoprani/infer-ts
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
infer_ts-0.1.2-cp310-abi3-macosx_11_0_arm64.whl -
Subject digest:
c374dae264a54e1a23de53e892541747562ba97f186cff85838ee32d77a6c070 - Sigstore transparency entry: 986282118
- Sigstore integration time:
-
Permalink:
andreasoprani/infer-ts@baa09bc3c90ccc817275d7cf1c1e67b91d84e519 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/andreasoprani
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@baa09bc3c90ccc817275d7cf1c1e67b91d84e519 -
Trigger Event:
push
-
Statement type:
File details
Details for the file infer_ts-0.1.2-cp310-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: infer_ts-0.1.2-cp310-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 4.8 MB
- Tags: CPython 3.10+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5707efd239bec53c42e322e20073de84dbe555271488032bdbe9fbce87e08347
|
|
| MD5 |
402c4d90366e19f858d88331427c6e61
|
|
| BLAKE2b-256 |
c83d3efa5ef6972d850d862cf98c77fbe7296f6052a5e029d8aac3a3d61c920f
|
Provenance
The following attestation bundles were made for infer_ts-0.1.2-cp310-abi3-macosx_10_12_x86_64.whl:
Publisher:
release.yml on andreasoprani/infer-ts
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
infer_ts-0.1.2-cp310-abi3-macosx_10_12_x86_64.whl -
Subject digest:
5707efd239bec53c42e322e20073de84dbe555271488032bdbe9fbce87e08347 - Sigstore transparency entry: 986282179
- Sigstore integration time:
-
Permalink:
andreasoprani/infer-ts@baa09bc3c90ccc817275d7cf1c1e67b91d84e519 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/andreasoprani
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@baa09bc3c90ccc817275d7cf1c1e67b91d84e519 -
Trigger Event:
push
-
Statement type: