Infer plausible time zones for a time series dataset based on Daylight Savings Time switches
Project description
tz-canary
- Time Zone Canary
In a perfect world, all time series data is time-zone-aware and stored in UTC. Sadly, we do not live in a perfect world. Time series data often lacks a time zone identifier, or worse, does not actually adhere to the time zone it claims to be in.
tz-canary
inspects the Daylight Savings Time (DST) switches in a time series to infer a set of plausible time zones the data could be in.
It allows you to infer the full set of plausible time zones for the data, or to validate whether a given time zone is plausible for the data.
Installation
tz-canary
is available on PyPI, so you can install it just like any other Python package:
pip install tz-canary
Usage
Time zone validation
The simplest way to use tz-canary
is to validate a given time zone for a time series:
import pandas as pd
from tz_canary import validate_time_zone
df = pd.read_csv("docs/data/example_data.csv", index_col="datetime", parse_dates=True)
validate_time_zone(df.index, "Europe/Amsterdam") # will pass
validate_time_zone(df.index, "America/New_York") # will raise ImplausibleTimeZoneError
validate_time_zone(df.index, "UTC") # will raise ImplausibleTimeZoneError
Time zone inference
You can also get a list of all plausible time zones for a time series:
from pprint import pprint
import pandas as pd
from tz_canary import infer_time_zone
df = pd.read_csv("docs/data/example_data.csv", index_col="datetime", parse_dates=True)
plausible_time_zones = infer_time_zone(df.index)
pprint(plausible_time_zones)
# Output:
# {zoneinfo.ZoneInfo(key='Africa/Ceuta'),
# zoneinfo.ZoneInfo(key='Arctic/Longyearbyen'),
# zoneinfo.ZoneInfo(key='Europe/Amsterdam'),
# ...
# zoneinfo.ZoneInfo(key='Europe/Zurich')}
Advanced usage: inference with cached TransitionsData
When processing many time series, it can be useful to cache the transitions data used by tz-canary
to infer time zones.
You can do this by creating a TransitionsData
object and passing it to infer_time_zone
(and this also works for validate_time_zone
):
import pandas as pd
from tz_canary import TransitionsData, infer_time_zone
# We create a TransitionsData object to avoid having to recompute the transitions for
# every call to validate_time_zone.
transitions_data = TransitionsData(2010, 2023)
for i in range(10):
df = pd.read_csv(
"docs/data/example_data.csv", # In reality, these would be different files
index_col="datetime",
parse_dates=True,
)
plausible_time_zones = infer_time_zone(df.index, transitions_data=transitions_data)
print(i, plausible_time_zones)
Development
- Make sure you have git, git LFS, and Poetry installed.
- Clone this repository:
git clone https://github.com/leonoverweel/tz-canary cd tz-canary
- Install the development requirements:
poetry install --with dev
- Install the pre-commit hooks (used for linting):
pre-commit install
- Run the tests:
poetry run pytest
Making a release
- Bump the version number in
pyproject.toml
and commit the change. - Make a new release on GitHub.
- Build the package:
poetry build
- Publish the package to PyPI:
poetry publish
Contributing
Please don't hesitate to open issues and PRs!
GitHub repository: https://github.com/leonoverweel/tz-canary.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for tz_canary-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2269e39bd231591daab56e6cb54ce306858cc691a2fcab143e7dbac025412c8 |
|
MD5 | 85386cc55cf785c2a5061986b068f56c |
|
BLAKE2b-256 | 131dfe3c05296a8f64dcb3727207b3c1811484f3ff4245f529e7caa0ff0e3d04 |