Auto-detect and standardize datetime formats from raw timestamps
Project description
🕒 formatify
🧠 Auto-detect and standardize messy timestamp formats. Perfect for log parsers, data pipelines, or anyone tired of wrestling with inconsistent datetime strings.
⚠️ Problem
Ever pulled in a CSV or log file and found timestamps like this?
2023-03-01T12:30:45Z, 01/03/2023 12:30, Mar 1 2023 12:30 PM
How do you reliably infer and standardize them — especially when:
- formats are mixed?
- you have no schema?
- fractional seconds and timezones are involved?
✅ Solution
formatify infers the datetime format(s) from a list of timestamp strings and gives you:
- a valid
strftimeformat string per group, - component roles (e.g. year, month, day),
- clean, standardized timestamps,
- structural grouping when needed.
No dependencies. Works out of the box.
📄 What This Library Does
Behind the scenes, formatify uses:
- Regex patterns to split and identify timestamp tokens
- Heuristics to assign roles like
year,month,hour, etc. - Frequency analysis to distinguish stable vs. changing components
- ISO 8601 detection for timezones, 'T' separators, and fractional seconds
- Smart fallbacks for missing delimiters or ambiguous parts
- Epoch detection (10 or 13 digit UNIX timestamps)
It produces:
- one or more
%Y-%m-%dT%H:%M:%SZ-style format strings - lists of cleaned, standardized
YYYY-MM-DD HH:MM:SSvalues - per-group accuracy and metadata
🚀 Quick Example
from formatify.main import analyze_heterogeneous_timestamp_formats
samples = [
"2023-07-15T14:23:05Z",
"15/07/2023 14:23",
"Jul 15, 2023 02:23 PM",
"1689433385000" # epoch in ms
]
results = analyze_heterogeneous_timestamp_formats(samples)
for gid, group in results.items():
print("Group", gid)
print("→ Format:", group["format_string"])
print("→ Standardized:", group["standardized_timestamps"][:2])
🔍 Features
✅ Auto-detect strftime format
✅ Handles ISO 8601, text months, UNIX epoch
✅ Infers year/month/day/hour/minute roles
✅ Groups mixed formats automatically
✅ Timezone-aware
✅ No dependencies
✅ Fast and customizable
🧪 API
Main Entry Point
analyze_heterogeneous_timestamp_formats(samples: List[str]) -> Dict[int, Dict[str, Any]]
Returns a dictionary mapping group IDs to result dictionaries. Each result includes:
format_string: inferredstrftimestringstandardized_timestamps: parsed & normalized stringscomponent_roles: index → rolechange_frequencies: component variabilityiso_features: flags for ISO 8601 traitsdetected_timezone: parsed offset (if any)coverage: fraction of total samples in this groupaccuracy: percent of valid parses in group
Lower-Level Functions
If you know all your samples have the same format:
infer_datetime_format_from_samples(samples: List[str]) -> Dict[str, Any]
🔊 Mixed Format Handling
formatify is designed to handle real-world timestamp mess. When your input includes a mix of styles — ISO, slashed, text-months, or epoch — it:
- Groups samples by structural similarity
- Infers format per group
- Standardizes timestamps across each group
This lets you feed in 3 formats or 30, and still get clean, grouped results.
👁️ Design Notes
Want to know how the internals work? Check out:
🔍 Dev Guide
# Clone the repo
git clone https://github.com/PieceWiseProjects/formatify.git
cd formatify_py
# Set up environment
uv pip install -e .[dev,test]
# Lint and format
uv run ruff src/formatify_py
# Run tests
uv run pytest --cov=src/formatify_py
# Build for release
uv run python -m build
🚰 Contributing
We're just getting started — contributions, issues, and ideas welcome!
- Fork and branch:
git checkout -b feature/my-feature - Code and test
- Lint and push
- Open a pull request 💡
Follow our Contributor Guidelines.
📜 License
MIT — see LICENSE for details.
🙌 Credits
Built and maintained by Aalekh Roy Part of the PieceWiseProjects initiative.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file formatify_py-1.0.0.tar.gz.
File metadata
- Download URL: formatify_py-1.0.0.tar.gz
- Upload date:
- Size: 11.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9c5a25c2c3a567e387bf4fd7b9c661e872cc368c78281c1f1baaf0ed7ea4852
|
|
| MD5 |
427b5dd26a272da7d3f67bfa680a4864
|
|
| BLAKE2b-256 |
1615d55a53f4284d17b9ca2426e8bb4a668be7821b8b1f3174840d9b6f58029a
|
Provenance
The following attestation bundles were made for formatify_py-1.0.0.tar.gz:
Publisher:
release.yml on PieceWiseProjects/formatify
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
formatify_py-1.0.0.tar.gz -
Subject digest:
c9c5a25c2c3a567e387bf4fd7b9c661e872cc368c78281c1f1baaf0ed7ea4852 - Sigstore transparency entry: 210091872
- Sigstore integration time:
-
Permalink:
PieceWiseProjects/formatify@31b0059c4211f4d7ba717ac3f7abcaf6b4f475f3 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/PieceWiseProjects
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@31b0059c4211f4d7ba717ac3f7abcaf6b4f475f3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file formatify_py-1.0.0-py3-none-any.whl.
File metadata
- Download URL: formatify_py-1.0.0-py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
30526e0a7890fa835457fd2505e95e6a1088348be7fea417d7bdc48de6f818d3
|
|
| MD5 |
94c4182a0124507df7359f910d201f59
|
|
| BLAKE2b-256 |
56fb6743e31ae2f87f8b0f3873c6a1fa15ae41f346344e33262ee101122f0941
|
Provenance
The following attestation bundles were made for formatify_py-1.0.0-py3-none-any.whl:
Publisher:
release.yml on PieceWiseProjects/formatify
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
formatify_py-1.0.0-py3-none-any.whl -
Subject digest:
30526e0a7890fa835457fd2505e95e6a1088348be7fea417d7bdc48de6f818d3 - Sigstore transparency entry: 210091876
- Sigstore integration time:
-
Permalink:
PieceWiseProjects/formatify@31b0059c4211f4d7ba717ac3f7abcaf6b4f475f3 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/PieceWiseProjects
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@31b0059c4211f4d7ba717ac3f7abcaf6b4f475f3 -
Trigger Event:
push
-
Statement type: