Skip to main content

Auto-detect and standardize datetime formats from raw timestamps

Project description

🕒 formatify


🧠 Auto-detect and standardize messy timestamp formats. Perfect for log parsers, data pipelines, or anyone tired of wrestling with inconsistent datetime strings.

PyPI version CI License: MIT

Downloads Python Platform Status


Demo of formatify in action


⚠️ Problem

Ever pulled in a CSV or log file and found timestamps like this?

2023-03-01T12:30:45Z, 01/03/2023 12:30, Mar 1 2023 12:30 PM

How do you reliably infer and standardize them — especially when:

  • formats are mixed?
  • you have no schema?
  • fractional seconds and timezones are involved?

✅ Solution

formatify infers the datetime format(s) from a list of timestamp strings and gives you:

  • a valid strftime format string per group,
  • component roles (e.g. year, month, day),
  • clean, standardized timestamps,
  • structural grouping when needed.

No dependencies. Works out of the box.


📄 What This Library Does

Behind the scenes, formatify uses:

  • Regex patterns to split and identify timestamp tokens
  • Heuristics to assign roles like year, month, hour, etc.
  • Frequency analysis to distinguish stable vs. changing components
  • ISO 8601 detection for timezones, 'T' separators, and fractional seconds
  • Smart fallbacks for missing delimiters or ambiguous parts
  • Epoch detection (10 or 13 digit UNIX timestamps)

It produces:

  • one or more %Y-%m-%dT%H:%M:%SZ-style format strings
  • lists of cleaned, standardized YYYY-MM-DD HH:MM:SS values
  • per-group accuracy and metadata

🚀 Quick Example

from formatify.main import analyze_heterogeneous_timestamp_formats

samples = [
    "2023-07-15T14:23:05Z",
    "15/07/2023 14:23",
    "Jul 15, 2023 02:23 PM",
    "1689433385000"  # epoch in ms
]

results = analyze_heterogeneous_timestamp_formats(samples)

for gid, group in results.items():
    print("Group", gid)
    print("→ Format:", group["format_string"])
    print("→ Standardized:", group["standardized_timestamps"][:2])

🔍 Features

✅ Auto-detect strftime format ✅ Handles ISO 8601, text months, UNIX epoch ✅ Infers year/month/day/hour/minute roles ✅ Groups mixed formats automatically ✅ Timezone-aware ✅ No dependencies ✅ Fast and customizable


🧪 API

Main Entry Point

analyze_heterogeneous_timestamp_formats(samples: List[str]) -> Dict[int, Dict[str, Any]]

Returns a dictionary mapping group IDs to result dictionaries. Each result includes:

  • format_string: inferred strftime string
  • standardized_timestamps: parsed & normalized strings
  • component_roles: index → role
  • change_frequencies: component variability
  • iso_features: flags for ISO 8601 traits
  • detected_timezone: parsed offset (if any)
  • coverage: fraction of total samples in this group
  • accuracy: percent of valid parses in group

Lower-Level Functions

If you know all your samples have the same format:

infer_datetime_format_from_samples(samples: List[str]) -> Dict[str, Any]

🔊 Mixed Format Handling

formatify is designed to handle real-world timestamp mess. When your input includes a mix of styles — ISO, slashed, text-months, or epoch — it:

  1. Groups samples by structural similarity
  2. Infers format per group
  3. Standardizes timestamps across each group

This lets you feed in 3 formats or 30, and still get clean, grouped results.


👁️ Design Notes

Want to know how the internals work? Check out:


🔍 Dev Guide

# Clone the repo
git clone https://github.com/PieceWiseProjects/formatify.git
cd formatify_py

# Set up environment
uv pip install -e .[dev,test]

# Lint and format
uv run ruff src/formatify_py

# Run tests
uv run pytest --cov=src/formatify_py

# Build for release
uv run python -m build

🚰 Contributing

We're just getting started — contributions, issues, and ideas welcome!

  1. Fork and branch: git checkout -b feature/my-feature
  2. Code and test
  3. Lint and push
  4. Open a pull request 💡

Follow our Contributor Guidelines.


📜 License

MIT — see LICENSE for details.


🙌 Credits

Built and maintained by Aalekh Roy Part of the PieceWiseProjects initiative.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

formatify_py-1.0.0.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

formatify_py-1.0.0-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file formatify_py-1.0.0.tar.gz.

File metadata

  • Download URL: formatify_py-1.0.0.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for formatify_py-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c9c5a25c2c3a567e387bf4fd7b9c661e872cc368c78281c1f1baaf0ed7ea4852
MD5 427b5dd26a272da7d3f67bfa680a4864
BLAKE2b-256 1615d55a53f4284d17b9ca2426e8bb4a668be7821b8b1f3174840d9b6f58029a

See more details on using hashes here.

Provenance

The following attestation bundles were made for formatify_py-1.0.0.tar.gz:

Publisher: release.yml on PieceWiseProjects/formatify

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file formatify_py-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: formatify_py-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for formatify_py-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 30526e0a7890fa835457fd2505e95e6a1088348be7fea417d7bdc48de6f818d3
MD5 94c4182a0124507df7359f910d201f59
BLAKE2b-256 56fb6743e31ae2f87f8b0f3873c6a1fa15ae41f346344e33262ee101122f0941

See more details on using hashes here.

Provenance

The following attestation bundles were made for formatify_py-1.0.0-py3-none-any.whl:

Publisher: release.yml on PieceWiseProjects/formatify

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page