Infer plausible time zones for a time series dataset based on Daylight Savings Time switches
Project description
tz-canary
- Time Zone Canary
In a perfect world, all time series data is time-zone-aware and stored in UTC. Sadly, we do not live in a perfect world. Time series data often lacks a time zone identifier, or worse, does not actually adhere to the time zone it claims to be in.
tz-canary
inspects the Daylight Savings Time (DST) switches in a time series to infer a set of plausible time zones the data could be in.
It allows you to infer the full set of plausible time zones for the data, or to validate whether a given time zone is plausible for the data.
Installation
tz-canary
is available on PyPI, so you can install it just like any other Python package:
pip install tz-canary
Usage
Time zone validation
The simplest way to use tz-canary
is to validate a given time zone for a time series:
import pandas as pd
from tz_canary import validate_time_zone
df = pd.read_csv("docs/data/example_data.csv", index_col="datetime", parse_dates=True)
validate_time_zone(df.index, "Europe/Amsterdam") # will pass
validate_time_zone(df.index, "America/New_York") # will raise ImplausibleTimeZoneError
validate_time_zone(df.index, "UTC") # will raise ImplausibleTimeZoneError
Time zone inference
You can also get a list of all plausible time zones for a time series:
from pprint import pprint
import pandas as pd
from tz_canary import infer_time_zone
df = pd.read_csv("docs/data/example_data.csv", index_col="datetime", parse_dates=True)
plausible_time_zones = infer_time_zone(df.index)
pprint(plausible_time_zones)
# Output:
# {zoneinfo.ZoneInfo(key='Africa/Ceuta'),
# zoneinfo.ZoneInfo(key='Arctic/Longyearbyen'),
# zoneinfo.ZoneInfo(key='Europe/Amsterdam'),
# ...
# zoneinfo.ZoneInfo(key='Europe/Zurich')}
Advanced usage: inference with cached TransitionsData
When processing many time series, it can be useful to cache the transitions data used by tz-canary
to infer time zones.
You can do this by creating a TransitionsData
object and passing it to infer_time_zone
(and this also works for validate_time_zone
):
import pandas as pd
from tz_canary import TransitionsData, infer_time_zone
# We create a TransitionsData object to avoid having to recompute the transitions for
# every call to validate_time_zone.
transition_data = TransitionsData(2010, 2023)
for i in range(10):
df = pd.read_csv(
"docs/data/example_data.csv", # In reality, these would be different files
index_col="datetime",
parse_dates=True,
)
plausible_time_zones = infer_time_zone(df.index, transition_data=transition_data)
print(i, plausible_time_zones)
Development
- Make sure you have git, git LFS, and Poetry installed.
- Clone this repository:
git clone https://github.com/leonoverweel/tz-canary cd tz-canary
- Install the development requirements:
poetry install --with dev
- Install the pre-commit hooks (used for linting):
pre-commit install
- Run the tests:
poetry run pytest
Making a release
- Bump the version number in
pyproject.toml
and commit the change. - Make a new release on GitHub.
- Build the package:
poetry build
- Publish the package to PyPI:
poetry publish
Contributing
Please don't hesitate to open issues and PRs!
GitHub repository: https://github.com/leonoverweel/tz-canary.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tz_canary-0.1.2.tar.gz
.
File metadata
- Download URL: tz_canary-0.1.2.tar.gz
- Upload date:
- Size: 5.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.11.4 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95ba8344bea0015ccba3acc7ba671fe3fec92e35bdc333a528b98c9fb57a6281 |
|
MD5 | 7b910159b6e2aeb410405753b758afd5 |
|
BLAKE2b-256 | 4258771a5523bd2d56c078a2998fca50007de778d53369aafe2454c91bf8b225 |
Provenance
File details
Details for the file tz_canary-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: tz_canary-0.1.2-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.11.4 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f2d622c7427f10f9d2672cf59569f05b9e04416f4132d468dd7a9897c1bf4df1 |
|
MD5 | 751b74fd8a37247894aa4dc30400e55c |
|
BLAKE2b-256 | 7ebb3f05ead2039371b30270c808f81112e291c765efe95cc63b5bdc543df3e9 |