Skip to main content

Infer plausible time zones for a time series dataset based on Daylight Savings Time switches

Project description

tz-canary - Time Zone Canary

In a perfect world, all time series data is time-zone-aware and stored in UTC. Sadly, we do not live in a perfect world. Time series data often lacks a time zone identifier, or worse, does not actually adhere to the time zone it claims to be in.

tz-canary inspects the Daylight Savings Time (DST) switches in a time series to infer a set of plausible time zones the data could be in. It allows you to infer the full set of plausible time zones for the data, or to validate whether a given time zone is plausible for the data.

Installation

tz-canary is available on PyPI, so you can install it just like any other Python package:

pip install tz-canary

Usage

Time zone validation

The simplest way to use tz-canary is to validate a given time zone for a time series:

import pandas as pd
from tz_canary import validate_time_zone

df = pd.read_csv("docs/data/example_data.csv", index_col="datetime", parse_dates=True)

validate_time_zone(df.index, "Europe/Amsterdam")  # will pass
validate_time_zone(df.index, "America/New_York")  # will raise ImplausibleTimeZoneError
validate_time_zone(df.index, "UTC")  # will raise ImplausibleTimeZoneError

Time zone inference

You can also get a list of all plausible time zones for a time series:

from pprint import pprint

import pandas as pd
from tz_canary import infer_time_zone

df = pd.read_csv("docs/data/example_data.csv", index_col="datetime", parse_dates=True)

plausible_time_zones = infer_time_zone(df.index)
pprint(plausible_time_zones)

# Output:
# {zoneinfo.ZoneInfo(key='Africa/Ceuta'),
#  zoneinfo.ZoneInfo(key='Arctic/Longyearbyen'),
#  zoneinfo.ZoneInfo(key='Europe/Amsterdam'),
#  ...
#  zoneinfo.ZoneInfo(key='Europe/Zurich')}

Advanced usage: inference with cached TransitionsData

When processing many time series, it can be useful to cache the transitions data used by tz-canary to infer time zones. You can do this by creating a TransitionsData object and passing it to infer_time_zone (and this also works for validate_time_zone):

import pandas as pd

from tz_canary import TransitionsData, infer_time_zone

# We create a TransitionsData object to avoid having to recompute the transitions for
#   every call to validate_time_zone.
transitions_data = TransitionsData(2010, 2023)

for i in range(10):
    df = pd.read_csv(
        "docs/data/example_data.csv",  # In reality, these would be different files
        index_col="datetime",
        parse_dates=True,
    )
    plausible_time_zones = infer_time_zone(df.index, transitions_data=transitions_data)
    print(i, plausible_time_zones)

Development

  1. Make sure you have git, git LFS, and Poetry installed.
  2. Clone this repository:
    git clone https://github.com/leonoverweel/tz-canary
    cd tz-canary
    
  3. Install the development requirements:
    poetry install --with dev
    
  4. Install the pre-commit hooks (used for linting):
    pre-commit install
    
  5. Run the tests:
    poetry run pytest
    

Making a release

  1. Bump the version number in pyproject.toml and commit the change.
  2. Make a new release on GitHub.
  3. Build the package:
    poetry build
    
  4. Publish the package to PyPI:
    poetry publish
    

Contributing

Please don't hesitate to open issues and PRs!

GitHub repository: https://github.com/leonoverweel/tz-canary.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tz_canary-0.2.0.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

tz_canary-0.2.0-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file tz_canary-0.2.0.tar.gz.

File metadata

  • Download URL: tz_canary-0.2.0.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/22.6.0

File hashes

Hashes for tz_canary-0.2.0.tar.gz
Algorithm Hash digest
SHA256 53abca6fd88c3b79b10db9496c0fd2e9794fec79409ae3b5e30901afa54c9e0b
MD5 f73757ba651c96ffc5885b8a95374fed
BLAKE2b-256 e74a7e694a9605e549c6b2fb21697ee4295cc354df4951385004a1f60c63ec42

See more details on using hashes here.

File details

Details for the file tz_canary-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: tz_canary-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/22.6.0

File hashes

Hashes for tz_canary-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b2269e39bd231591daab56e6cb54ce306858cc691a2fcab143e7dbac025412c8
MD5 85386cc55cf785c2a5061986b068f56c
BLAKE2b-256 131dfe3c05296a8f64dcb3727207b3c1811484f3ff4245f529e7caa0ff0e3d04

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page