Skip to main content

TimeSeries Extensions for SGN Framework

Project description

SGN-TS (SGN TimeSeries)

SGN-TS is set of extensions to the core library sgn, that includes functionality specific to TimeSeries analysis. This page is for documenatation of the sgnts package, but there is a family of libraries that extend the functionality of SGN in other ways, including:

  • sgn: Base library for SGN
  • sgn-ligo: LIGO-specific utilities for SGN

Installation

To install SGN-TS, simply run:

pip install sgn-ts

Optional Dependencies

SGN-TS supports PyTorch as an optional dependency for improved performance in certain operations. To install SGN-TS with PyTorch support:

pip install sgn-ts[torch]

When PyTorch is not installed, SGN-TS will fall back to NumPy implementations for all operations. The following components benefit from PyTorch when available:

  • TorchBackend array operations
  • Converter transform for converting between NumPy and PyTorch arrays
  • Resampler transform for efficient resampling operations

More SGN-TS-specific documentation coming soon.

Developer's guide

Before reading this guide you should carefully read and understand the SGN developers guide.

The core motivation with SGN TS (sgnts) is to build Time Series (TS) handling into SGN. This is appropriate for e.g., signal processing applications. Of course nothing is stopping you from doing any of these things with just SGN, but you will likely have to deal with some of the conceptual and technical hurdles that this library solves. That being said, there are many limitations of sgnts and you should understand those carefully in the context of your project. We are open to making changes that reach a wider audience, so please let us know your thoughts.

New concepts over SGN:

  • Data are now rigidly defined to be uniformly sampled time series. There is an expectation that elements will deal with data in a synchronous way.
  • Synchronization means that the continuity equation must be satisfied. Data cannot be produced at a higher rate in one source element than another, otherwise synchronous operations will be impossible without data "piling up" somewhere.
  • Time stamp bookeeping accuracy is important. The library aims to keep single sample point timing accuracies even for applications that are designed to run uninterupped for years. This requires a bit of rigidity in bookeeping, but we try to hide as much as possible from the causual developer and user.

Buffers and Frames

The most important new class in sgnts is the TSFrame which holds a list of SeriesBuffers

Here we can get some familiarity with both of these objects and along the way, other classes and concepts relevant for sgnts.

>>> import numpy
>>> from sgnts.base.buffer import SeriesBuffer
>>> buf = SeriesBuffer(offset=0, sample_rate=2048, data=numpy.random.randn(2048))
>>> print (buf)
SeriesBuffer(offset=0, offset_end=16384, shape=(2048,), sample_rate=2048, duration=1000000000, data=[0.56649291 ... 1.39569688])

There is plenty to unpack here, so lets go step by step.

offset:

offset is globally meaningful throughout the application and acts as a precise surrogate for time, i.e., an absolute "time" reference for any element within an sgnts application that should not suffer from any rounding error. Technically offsets are defined as a cumulative number of samples passed defined at the maximum sample rate allowed by the application. This will be explained more below.

sample_rate:

sample_rate is the number of samples per second that a stretch of data contains. It is used to convert to actual time with nanosecond precision. In order to make certain gaurantees about precision in sgnts, we currently only support power of 2 sample rates from 1 Hz to a maximum which defaults to 16384 Hz. The max sample rate and allowed rates are defined here.

data:

data is generally a numpy array that can be interpreted as (possibly multidimensional) time series data.

Now revisiting the above

>>> buf = SeriesBuffer(offset=0, sample_rate=2048, data=numpy.random.randn(2048))
>>> print (buf)
SeriesBuffer(offset=0, offset_end=16384, shape=(2048,), sample_rate=2048, duration=1000000000, data=[0.56649291 ... 1.39569688])

we see the following. The user specified data as a 2048 sample long set of random gaussian distributed numbers. Since the sample_rate is also 2048 seconds, this is interpreted as 1 second of time series data. When printing the buffer you can see duration=1000000000 which is equal to 1e9 nanoseconds (time is stored as integer nanoseconds). You can see offset_end=16384 which indicates the number of samples that would be in this data if it where at the maximum sample rate. That is what an offset defines -- a sample count assuming max sample rate. It is critical for accurate internal bookkeeping. You also see shape=(2048,) which indicates single channel time series. Try the following for an example of multichannel audio:

>>> buf = SeriesBuffer(offset=0, sample_rate=2048, data=numpy.random.randn(2,2048))
>>> print (buf)
SeriesBuffer(offset=0, offset_end=16384, shape=(2, 2048), sample_rate=2048, duration=1000000000, data=[[ 0.01684876 ... -1.6963346 ]
 [-0.55875476 ...  0.58967178]])

Note what happens to the offset if you change the sample rate (and in this case also the data size)

>>> buf = SeriesBuffer(offset=0, sample_rate=1024, data=numpy.random.randn(2,1024))
>>> buf
SeriesBuffer(offset=0, offset_end=16384, shape=(2, 1024), sample_rate=1024, duration=1000000000, data=[[-0.13116052 ...  1.2223811 ]
 [-0.98786954 ... -0.56760618]])

It stays the same. Remember that the offset is the sample count at the theoretical maximum sample rate which is defined in offset.py.

Only power of two sample rates are allowed at present to ensure that bookeeping remains simple and accurate.

>>> buf = SeriesBuffer(offset=0, sample_rate=1000, data=numpy.random.randn(2,1000))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 7, in __init__
  File "/Users/crh184/Library/Python/3.9/lib/python/site-packages/sgnts/base/buffer.py", line 38, in __post_init__
    raise ValueError("%s not in allowed rates %s" % (self.sample_rate, Offset.ALLOWED_RATES))
ValueError: 1000 not in allowed rates {32, 1, 2, 64, 4, 128, 256, 512, 8, 1024, 2048, 4096, 8192, 16, 16384}

It is possible to increase the maximum sample rate globally in an application

>>> import numpy
>>> from sgnts.base.buffer import SeriesBuffer
>>> from sgnts.base.offset import Offset
>>> Offset.set_max_rate(262144)
>>> buf = SeriesBuffer(offset=0, sample_rate=32768, data=numpy.random.randn(32768))
>>> print (buf)
SeriesBuffer(offset=0, offset_end=262144, shape=(32768,), sample_rate=32768, duration=1000000000, data=[-0.08916502 ...  0.89236118])

Buffers are not the primary data type passed around between element in sgnts. Rather, it is a TSFrame. TSFrames hold lists of buffers

>>> import numpy
>>> from sgnts.base.buffer import SeriesBuffer, TSFrame
>>> 
>>> # An example of just one buffer
>>> buf1 = SeriesBuffer(offset=0, sample_rate=2048, data=numpy.random.randn(2048))
>>> frame = TSFrame(buffers=[buf1])
>>> print (frame)

	SeriesBuffer(offset=0, offset_end=16384, shape=(2048,), sample_rate=2048, duration=1000000000, data=[-0.04094335 ... -1.49758223])
>>> 
>>> # An example of two contiguous buffers
>>> buf1 = SeriesBuffer(offset=0, sample_rate=2048, data=numpy.random.randn(2048))
>>> buf2 = SeriesBuffer(offset=16384, sample_rate=2048, data=numpy.random.randn(2048))
>>> frame = TSFrame(buffers=[buf1, buf2])
>>> print (frame)

	SeriesBuffer(offset=0, offset_end=16384, shape=(2048,), sample_rate=2048, duration=1000000000, data=[-1.56771352 ... -0.20928693])
	SeriesBuffer(offset=16384, offset_end=32768, shape=(2048,), sample_rate=2048, duration=1000000000, data=[-1.00442217 ... -0.75684022])
>>> 
>>> # An example of two non contiguous buffers. NOTE THIS SHOULDN'T WORK!!
>>> buf1 = SeriesBuffer(offset=0, sample_rate=2048, data=numpy.random.randn(2048))
>>> buf2 = SeriesBuffer(offset=12345, sample_rate=2048, data=numpy.random.randn(2048))
>>> frame = TSFrame(buffers=[buf1, buf2])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 8, in __init__
  File "/Users/crh184/Library/Python/3.9/lib/python/site-packages/sgnts/base/buffer.py", line 455, in __post_init__
    self.__sanity_check(self.buffers)
  File "/Users/crh184/Library/Python/3.9/lib/python/site-packages/sgnts/base/buffer.py", line 485, in __sanity_check
    assert off0 == sl.start
AssertionError

Note in the above that TSFrames only support contiguous buffers

TSFrames offer some additional methods to describe their contents, e.g.,

>>> buf1 = SeriesBuffer(offset=0, sample_rate=2048, data=numpy.random.randn(2048))
>>> buf2 = SeriesBuffer(offset=16384, sample_rate=2048, data=numpy.random.randn(2048))
>>> frame = TSFrame(buffers=[buf1, buf2])
>>> 
>>> # Get the offset of the first buffer
>>> print (frame.offset)
0
>>> 
>>> # Get the offset end of the last buffer
>>> print (frame.end_offset)
32768
>>> 
>>> # Get the sample rate
>>> print (frame.sample_rate)
2048
>>> 
>>> # Iterate over the buffers
>>> for buf in frame:
...     print (buf)
... 
SeriesBuffer(offset=0, offset_end=16384, shape=(2048,), sample_rate=2048, duration=1000000000, data=[0.01658589 ... 0.76543937])
SeriesBuffer(offset=16384, offset_end=32768, shape=(2048,), sample_rate=2048, duration=1000000000, data=[0.76470737 ... 0.89438121])

TSFrames must be initialized with at least one buffer because metadata are derived from the buffer(s). If you want to have an empty frame, you still have to set one buffer with the correct metadata, e.g.,

>>> # empty buffer
>>> buf = SeriesBuffer(offset=0, sample_rate=2048, shape=(2048,), data=None)
>>> frame = TSFrame(buffers=[buf])

Advanced TSFrame techniques

There are shortcuts for producing a new empty TSFrame that might be useful if your goal is to just spit out some similar empty frames to fill in, e.g.,

>>> frame = TSFrame.from_buffer_kwargs(offset=0, sample_rate=2048, shape=(2048,))
>>> print (frame)

	SeriesBuffer(offset=0, offset_end=16384, shape=(2048,), sample_rate=2048, duration=1000000000, data=None)
>>> print (next(frame))

	SeriesBuffer(offset=16384, offset_end=32768, shape=(2048,), sample_rate=2048, duration=1000000000, data=None)

Writing a new source element

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sgn_ts-0.3.0.tar.gz (174.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sgn_ts-0.3.0-py3-none-any.whl (65.4 kB view details)

Uploaded Python 3

File details

Details for the file sgn_ts-0.3.0.tar.gz.

File metadata

  • Download URL: sgn_ts-0.3.0.tar.gz
  • Upload date:
  • Size: 174.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for sgn_ts-0.3.0.tar.gz
Algorithm Hash digest
SHA256 31cdb34f42fd15c3edacd4386c9e42d7d1c95ac15a349f503a2fdb2b4a8dd871
MD5 4b67122db59956a97fec16b6f5303954
BLAKE2b-256 79c77f78d8680a744b8aacf8461ccee829be8ae70965a4e13f9f489a817c8420

See more details on using hashes here.

File details

Details for the file sgn_ts-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: sgn_ts-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 65.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for sgn_ts-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 97d5794890c1e99011b12856635edbed52e6731621a7d86cd7ac79e74eb412de
MD5 56cf0f2ba7b9fc64a38eb9d7bfe46ead
BLAKE2b-256 3831176237a82f08c7e7a83c0f7a2086ca86b9714deff26b53b26b158835a984

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page