Skip to main content

A fast utility to parse and output timestamps in ISO8601/RFC3339 format, written mostly in C.

Project description

License

rfc3339lib

rfc339lib is a Python library to quickly and efficiently manage date-time entities that confrom to the RFC3339 Specification for dates and times. rfc339lib is written primarily in C with Cython bindings, and so it is extremely fast.

Distribution

Installation

Use the package manager pip to install rfc3339lib.

pip install rfc3339lib

Usage

Many examples of usage are available in the main test files included in the t/ subdirectory.

import rfc3339lib
import datetime

now = datetime.datetime.now()
rfcnow = rfc3339lib.to_rfc3339(now)
print(rfcnow)

dt = rfc3339lib.from_rfc3339(rfcnow)
print(dt)

print( rfc3339lib.is_rfc3339(rfcnow), "should be true")

# Sometimes separators are different. For example, a ':' can't be in a MacOS filename.
macsafe = rfcnow.replace(':', '.')
print(macsafe)

print(rfc3339lib.is_rfc3339(macsafe, strict=False), "should be true")
 # yes, okay

print(rfc3339lib.is_rfc3339(macsafe, strict=True), 'should be false') #no, false

print("This should be okay:", rfc3339lib.from_rfc3339(macsafe, strict=False))

try:
    rfc3339lib.from_rfc3339(macsafe, strict=True)
    raise Exception("This should not be reached because macsafe is not, strictly speaking, a valid string")
except ValueError as ve:
    print("Correctly threw value error for manipulated timestamp in strict mode")

Quicks, Implementation, and Rationale

strictness and permissive parsing

All strings generated by totimestamp() will be ISO8601/RFC3339 strings. However, the default approach is to take a somewhat permissive approach to parsing strings under the assumption that, if the programmer is requesting a datetime object for a formatted string, they are more interested in the the actual date than whether the source was candidly aware of the nuances of the ISO8601 standard. There are other situations in which the standard cannot be met (as shown in the example) - MacOS does not like colons in filenames, and so if they are parsing timestamp-encoded files (mydata.{timestamp}.csv), the source provider may verywell have changed the colon to some unknown-but-definitely-not-a-number character.

So, there are essentially three levels of checking conducted. There is nonstrict, permissive checking, that requires the format to be roughly correct. There is strict checking, that requires all non-numeric characters to be exactly valid. There is range checking that ensures that the values of numbers are valid, i.e. the second and the day are within appropriate ranges, and this is effectively a separate issue from the non-numeric strictness.

So, here are implementation points to be aware of:

  • In nonstrict mode, a single-digit hour or a signle-digit hour timezone is permitted. ** In nonstrict mode, there are 3 classifications of characters: numerals, alphanumerals, and non-alphanumeric. ** Time-component separators, date-component separators must be non-alphanumeric. ** The separator between Time and Date must be non-numeric, allowing for 'T', space, or some other characer, but not '-' or ':' as that would suggest a time/date separation. ** The separator to indicate a fractional section may be '.' or ',', keeping with iso8601 even though RFC3339 indicates '.' is the only acceptable value
  • In strict mode: ** As required by ISO8061, hours must be 2 digit and 0-padded. ** Time separators must be '-' ** Date separators must be ':' ** The separator between time and date must be 't', 'T', or ' ' (RFC3339 notes that it is permissible to use another character "(say) a space character" for readability).
  • Hour=24. ISO8601-2004 allowed, but advised against, an hour of 24, and it was prohibited in RFC3339. The new ISO8601-2019 specification has updated ISO8601-20004 to prohibit it. While it is possible that archival timestamps pushing against the advisement may have an hour of 24, it is not permitted in range checking.

Leap Seconds

Leap seconds are real and part of the specification in ISO8601 and RFC3339. However, they cannot be calculated, and so it would not be reliable to attempt to validate them. Further, Python's native datetime.datetime object does not allow for a leap second. This has led to the following design decisions:

The range checking will permit a value of 60 for second when parsing the value, but does not verify that it falls on a listed leap second.

The request to actually parse a timestamp and return the time values in the C/Cython code may return 60 as a second value.

The translation into a Python datetime object, by default, will convert a leap second to second=59,microsecond=9999999 to avoid inexplicable program crashes in the roughly 1-in-50million seconds that are leap.

This translation can be overrriden - see the function definition; however, note that if a leap second is encounted, the creation of the datetime.datetime object will throw an exception in the current version of Python.

Potential Errors

Contributing

Contributions and collaboration is welcome. Please contact me in advance for features you would like or would like to add.

Author

Kevin Crouse. Copyright, 2021.

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datetimeparse-0.5.1.tar.gz (53.2 kB view hashes)

Uploaded Source

Built Distribution

datetimeparse-0.5.1-cp38-cp38-manylinux2014_x86_64.whl (168.7 kB view hashes)

Uploaded CPython 3.8

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page