Efficient portable machine native columnar file storage of time series data for double float and signed 64-bit integers.
Project description
Efficient portable machine native columnar file storage of time series data for double float and signed 64-bit integers.
Latest release 20240316: Fixed release upload artifacts.
The core purpose is to provide time series data storage; there are assorted convenience methods to export arbitrary subsets of the data for use by other libraries in common forms, such as dataframes or series, numpy arrays and simple lists. There are also some simple plot methods for plotting graphs.
Three levels of storage are defined here:
TimeSeriesFile: a single file containing a binary list of float64 or signed int64 valuesTimeSeriesPartitioned: a directory containing multipleTimeSeriesFilefiles, each covering a separate time span according to a supplied policy, for example a calendar monthTimeSeriesDataDir: a directory containing multipleTimeSeriesPartitionedsubdirectories, each for a different time series, for example one subdirectory for grid voltage and another for grid power
Together these provide a hierarchy for finite sized files storing unbounded time series data for multiple parameters.
On a personal basis, I use this as efficient storage of time
series data from my solar inverter, which reports in a slightly
clunky time limited CSV format; I import those CSVs into
time series data directories which contain the overall accrued
data; see my cs.splink module which is built on this module.
Function array_byteswapped(ary)
Context manager to byteswap the array.array ary temporarily.
Class ArrowBasedTimespanPolicy(TimespanPolicy, icontract._metaclass.DBC, HasEpochMixin, TimeStepsMixin, cs.deco.Promotable)
A TimespanPolicy based on an Arrow format string.
See the raw_edges method for the specifics of how these are defined.
Function as_datetime64s(times, unit='s', utcoffset=0)
Return a Numpy array of datetime64 values
computed from an iterable of int/float UNIX timestamp values.
The optional unit parameter (default 's') may be one of:
's': seconds'ms': milliseconds'us': microseconds'ns': nanoseconds and represents the precision to preserve in the source time when converting to adatetime64. Less precision gives greater time range.
Function datetime64_as_timestamp(dt64: numpy.datetime64)
Return the UNIX timestamp for the datetime64 value dt64.
Function deduce_type_bigendianness(typecode: str) -> bool
Deduce the native endianness for typecode,
an array/struct typecode character.
Class Epoch(Epoch, builtins.tuple, TimeStepsMixin, cs.deco.Promotable)
The basis of time references with a starting UNIX time start
and a step defining the width of a time slot.
Function get_default_timezone_name()
Return the default timezone name.
Class HasEpochMixin(TimeStepsMixin)
A TimeStepsMixin with .start and .step derived from self.epoch.
Function main(argv=None)
Run the command line tool for TimeSeries data.
Function plot_events(start, stop, events, value_func, *, utcoffset, figure=None, ax=None, **scatter_kw) -> matplotlib.axes._axes.Axes
Plot events, an iterable of objects with .unixtime
attributes such as an SQLTagSet.
Return the Axes on which the plot was made.
Parameters:
events: an iterable of objects with.unixtimeattributesvalue_func: a callable to compute the y-axis value from an eventstart: optional start UNIX time, used to crop the events plottedstop: optional stop UNIX time, used to crop the events plottedfigure,ax: optional arguments as forcs.mplutils.axesutcoffset: optional UTC offset for presentation Other keyword parameters are passed toAxes.scatter.
Class PlotSeries(PlotSeries, builtins.tuple, cs.deco.Promotable)
Information about a series to be plotted:
label: the label for this seriesseries: aSeriesextra: adictof extra information such as plot styling
Class TimePartition(TimePartition, builtins.tuple, TimeStepsMixin)
A namedtuple for a slice of time with the following attributes:
epoch: the referenceEpochname: the name for this slicestart_offset: the epoch offset of the start timeend_offset: the epoch offset of the end time
These are used by TimespanPolicy instances to express the partitions
into which they divide time.
Function timerange(*da, **dkw)
A decorator intended for plotting functions or methods which
presents optional start and stop leading positional
parameters and optional tz or utcoffset keyword parameters.
The decorated function will be called with leading start
and stop positional parameters and a specific utcoffset
keyword parameter.
The as-decorated function is called with the following parameters:
start: an optional UNIX timestamp positional for the start of the range; if omitted the default isself.start; this is a required parameter if the decorator hasneeds_start=Truestop: an optional UNIX timestamp positional parameter for the end of the range; if omitted the default isself.stop; this is a required parameter if the decorator hasneeds_stop=Truetz: optional timezonedatetime.tzinfoobject or specification as fortzfor(); this is used to infer a UTC offset in secondsutcoffset: an optional offset from UTC time in seconds Other parameters are passed through to the deocrated function.
A decorated method is then called as:
method(self, start, stop, *a, utcoffset=utcoffset, **kw)
where *a and **kw are the additional positional and keyword
parameters respectively, if any.
A decorated function is called as:
function(start, stop, *a, utcoffset=utcoffset, **kw)
The utcoffset is an offset to apply to UTC-based time data
for presentation on the graph, largely because the plotting
functions use DataFrame.plot which broadly ignores attempts
to set locators or formatters because it supplies its own.
The plotting function would shift the values of the DataFrame
index using this value.
If neither utcoffset or tz is supplied by the caller, the
utcoffset is 0.0.
A specified utcoffset is passed through.
A tz is promoted to a tzinfo instance via the tzfor()
function and applied to the stop timestamp to obtain a
datetime from which the utcoffset will be derived.
It is an error to specify both utcoffset and tz.
Class TimeSeries(cs.resources.MultiOpenMixin, cs.context.ContextManagerMixin, HasEpochMixin, TimeStepsMixin)
Common base class of any time series.
Function timeseries_from_path(tspath: str, epoch: Optional[cs.timeseries.Epoch] = None, typecode=None)
Turn a time series filesystem path into a time series:
- a file: a
TimeSeriesFile - a directory holding
.cstsfiles: aTimeSeriesPartitioned - a directory: a
TimeSeriesDataDir
Class TimeSeriesBaseCommand(cs.cmdutils.BaseCommand)
Abstract base class for command line interfaces to TimeSeries data files.
Command line usage:
Usage: timeseriesbase subcommand [...]
Subcommands:
fetch ...
Fetch raw data files from the primary source to a local spool.
To be implemented in subclasses.
help [-l] [subcommand-names...]
Print help for subcommands.
This outputs the full help for the named subcommands,
or the short help for all subcommands if no names are specified.
-l Long help even if no subcommand-names provided.
import ...
Import data into the time series.
To be implemented in subclasses.
info
Report information.
plot [-f] [-o imgpath.png] [--show] [--tz tzspec] start-time [stop-time] [{glob|fields}...]
Plot the data from specified fields for the specified time range.
Options:
--bare Strip axes and padding from the plot.
-f Force. -o will overwrite an existing image file.
-o imgpath.png File system path to which to save the plot.
--show Show the image in the GUI.
--tz tzspec Skew the UTC times presented on the graph
The default skew is 0 i.e. UTC.
to emulate the timezone specified by tzspec.
--stacked Stack the plot lines/areas.
start-time An integer number of days before the current time
or any datetime specification recognised by
dateutil.parser.parse.
stop-time Optional stop time, default now.
An integer number of days before the current time
or any datetime specification recognised by
dateutil.parser.parse.
glob|fields If glob is supplied, constrain the keys of
a TimeSeriesDataDir by the glob.
shell
Run a command prompt via cmd.Cmd using this command's subcommands.
Class TimeSeriesCommand(TimeSeriesBaseCommand, cs.cmdutils.BaseCommand)
Command line interface to TimeSeries data files.
Command line usage:
Usage: timeseries [-s ts-step] tspath subcommand...
-s ts-step Specify the UNIX time step for the time series,
used if the time series is new and checked otherwise.
tspath The filesystem path to the time series;
this may refer to a single .csts TimeSeriesFile, a
TimeSeriesPartitioned directory of such files, or
a TimeSeriesDataDir containing partitions for
multiple keys.
Subcommands:
dump
Dump the contents of tspath.
fetch ...
Fetch raw data files from the primary source to a local spool.
To be implemented in subclasses.
help [-l] [subcommand-names...]
Print help for subcommands.
This outputs the full help for the named subcommands,
or the short help for all subcommands if no names are specified.
-l Long help even if no subcommand-names provided.
import csvpath datecol[:conv] [import_columns...]
Import data into the time series.
csvpath The CSV file to import.
datecol[:conv]
Specify the timestamp column and optional
conversion function.
"datecol" can be either the column header name
or a numeric column index counting from 0.
If "conv" is omitted, the column should contain
a UNIX seconds timestamp. Otherwise "conv"
should be either an identifier naming one of
the known conversion functions or an "arrow.get"
compatible time format string.
import_columns
An optional list of column names or their derived
attribute names. The default is to import every
numeric column except for the datecol.
info
Report information about the time series stored at tspath.
plot [-f] [-o imgpath.png] [--show] [--tz tzspec] start-time [stop-time] [{glob|fields}...]
Plot the data from specified fields for the specified time range.
Options:
--bare Strip axes and padding from the plot.
-f Force. -o will overwrite an existing image file.
-o imgpath.png File system path to which to save the plot.
--show Show the image in the GUI.
--tz tzspec Skew the UTC times presented on the graph
The default skew is 0 i.e. UTC.
to emulate the timezone specified by tzspec.
--stacked Stack the plot lines/areas.
start-time An integer number of days before the current time
or any datetime specification recognised by
dateutil.parser.parse.
stop-time Optional stop time, default now.
An integer number of days before the current time
or any datetime specification recognised by
dateutil.parser.parse.
glob|fields If glob is supplied, constrain the keys of
a TimeSeriesDataDir by the glob.
shell
Run a command prompt via cmd.Cmd using this command's subcommands.
test [testnames...]
Run some tests of functionality.
Class TimeSeriesDataDir(TimeSeriesMapping, builtins.dict, cs.resources.MultiOpenMixin, cs.context.ContextManagerMixin, cs.fs.HasFSPath, cs.configutils.HasConfigIni, HasEpochMixin, TimeStepsMixin)
A directory containing a collection of TimeSeriesPartitioned subdirectories.
Class TimeSeriesFile(TimeSeries, cs.resources.MultiOpenMixin, cs.context.ContextManagerMixin, HasEpochMixin, TimeStepsMixin, cs.fs.HasFSPath)
A file containing a single time series for a single data field.
This provides easy access to a time series data file.
The instance can be indexed by UNIX time stamp for time based access
or its .array property can be accessed for the raw data.
The data file itself has a header indicating the file data big endianness,
the datum type and the time type (both array.array type codes).
Following these are the start and step sizes in the time type format.
This is automatically honoured on load and save.
A new file will use the native endianness, but files of other
endianness are correctly handled, making a TimeSeriesFile
portable between architectures.
Read only users can just instantiate an instance and access
its .array property, or use the peek and peek_offset methods.
Read/write users should use the instance as a context manager, which will automatically update the file with the array data on exit:
with TimeSeriesFile(fspath) as ts:
... work with ts here ...
Note that the save-on-close is done with TimeSeries.flush()
which only saves if self.modified.
Use of the __setitem__ or pad_to methods set this flag automatically.
Direct access via the .array will not set it,
so users working that way for performance should update the flag themselves.
A TimeSeriesFile has two underlying modes of operation:
in-memory array.array mode and direct-to-file mmap mode.
The in-memory mode reads the whole file into an array.array instance,
and all updates then modify the in-memory array.
The file is saved when the context manager exits or when .save() is called.
This maximises efficiency when many accesses are done.
The mmap mode maps the file into memory, and accesses operate
directly against the file contents.
This is more efficient for just a few accesses,
but every "write" access (setting a datum) will make the mmapped page dirty,
causing the OS to queue it for disc.
This mode is recommended for small accesses
such as updating a single datum, eg from polling a data source.
Presently the mode used is triggered by the access method.
Using the peek and poke methods uses mmap by default.
Other accesses default to use the in-memory mode.
Access to the .array property forces use of the array mode.
Poll/update operations should usually choose to use peek/poke.
Method TimeSeriesFile.__init__(self, fspath: str, typecode: Optional[cs.timeseries.TypeCode] = None, *, epoch: Optional[cs.timeseries.Epoch] = None, fill=None, fstags: cs.fstags.FSTags):
Prepare a new time series stored in the file at fspath
containing machine native data for the time series values.
Parameters:
fspath: the filename of the data filetypecodeoptional expectedarray.typecodevalue of the data; if specified and the data file exists, they must match; if not specified then the data file must exist and thetypecodewill be obtained from its headerepoch: optionalEpochspecifying the start time and step size for the time series data in the file; if specified and the data file exists, they must match; if not specified then the data file must exist and theepochwill be obtained from its headerfill: optional default fill values forpad_to; if unspecified, fill with0for'q'andfloat('nan')for'd'
Class TimeSeriesFileHeader(cs.binary.SimpleBinary, types.SimpleNamespace, cs.binary.AbstractBinary, cs.binary.BinaryMixin, HasEpochMixin, TimeStepsMixin)
The binary data structure of the TimeSeriesFile file header.
This is 24 bytes long and consists of:
- the 4 byte magic number,
b'csts' - the file bigendian marker, a
structbyte order indicator with a value ofb'>'for big endian data orb'<'for little endian data - the datum typecode,
b'd'for double float orb'q'for signed 64 bit integer - the time typecode,
b'd'for double float orb'q'for signed 64 bit integer - a pad byte, value
b'_' - the start UNIX time, a double float or signed 64 bit integer according to the time typecode and bigendian flag
- the step size, a double float or signed 64 bit integer according to the time typecode and bigendian flag
In addition to the header values tnd methods this also presents:
datum_type: aBinarySingleStructfor the binary form of a data valuetime_type: aBinarySingleStructfor the binary form of a time value
Class TimeSeriesMapping(builtins.dict, cs.resources.MultiOpenMixin, cs.context.ContextManagerMixin, HasEpochMixin, TimeStepsMixin)
A group of named TimeSeries instances, indexed by a key.
This is the basis for TimeSeriesDataDir.
Class TimeSeriesPartitioned(TimeSeries, cs.resources.MultiOpenMixin, cs.context.ContextManagerMixin, HasEpochMixin, TimeStepsMixin, cs.fs.HasFSPath)
A collection of TimeSeries files in a subdirectory.
We have one of these for each TimeSeriesDataDir key.
This class manages a collection of files
named by the partition from a TimespanPolicy,
which dictates which partition holds the datum for a UNIX time.
Method TimeSeriesPartitioned.__init__(self, dirpath: str, typecode: Optional[cs.timeseries.TypeCode] = None, *, epoch: Optional[cs.timeseries.Epoch] = None, policy, fstags: cs.fstags.FSTags):
Initialise the TimeSeriesPartitioned instance.
Parameters:
dirpath: the directory filesystem path, known as.fspathwithin the instancetypecode: thearraytype code for the dataepoch: the time seriesEpochpolicy: the partitioningTimespanPolicy
The instance requires a reference epoch
because the policy start times will almost always
not fall on exact multiples of epoch.step.
The reference allows for reliable placement of times
which fall within epoch.step of a partition boundary.
For example, if epoch.start==0 and epoch.step==6 and a
partition boundary came at 19 due to some calendar based
policy then a time of 20 would fall in the partion left
of the boundary because it belongs to the time slot commencing
at 18.
If epoch or typecode are omitted the file's
fstags will be consulted for their values.
The start parameter will further fall back to 0.
This class does not set these tags (that would presume write
access to the parent directory or its .fstags file)
when a TimeSeriesPartitioned is made by a TimeSeriesDataDir
instance it sets these flags.
Class TimespanPolicy(icontract._metaclass.DBC, HasEpochMixin, TimeStepsMixin, cs.deco.Promotable)
A class implementing a policy allocating times to named time spans.
The TimeSeriesPartitioned uses these policies
to partition data among multiple TimeSeries data files.
Probably the most important methods are:
span_for_time: return aTimePartitionfrom a UNIX timespan_for_name: return aTimePartitionfrom a partition name
Method TimespanPolicy.__init__(self, epoch: cs.timeseries.Epoch):
Initialise the policy.
Class TimespanPolicyAnnual(ArrowBasedTimespanPolicy, TimespanPolicy, icontract._metaclass.DBC, HasEpochMixin, TimeStepsMixin, cs.deco.Promotable)
A annual time policy. PARTITION_FORMAT = 'YYYY' ARROW_SHIFT_PARAMS = {'years': 1}
Class TimespanPolicyDaily(ArrowBasedTimespanPolicy, TimespanPolicy, icontract._metaclass.DBC, HasEpochMixin, TimeStepsMixin, cs.deco.Promotable)
A daily time policy. PARTITION_FORMAT = 'YYYY-MM-DD' ARROW_SHIFT_PARAMS = {'days': 1}
Class TimespanPolicyMonthly(ArrowBasedTimespanPolicy, TimespanPolicy, icontract._metaclass.DBC, HasEpochMixin, TimeStepsMixin, cs.deco.Promotable)
A monthly time policy. PARTITION_FORMAT = 'YYYY-MM' ARROW_SHIFT_PARAMS = {'months': 1}
Class TimespanPolicyWeekly(ArrowBasedTimespanPolicy, TimespanPolicy, icontract._metaclass.DBC, HasEpochMixin, TimeStepsMixin, cs.deco.Promotable)
A weekly time policy. PARTITION_FORMAT = 'W' ARROW_SHIFT_PARAMS = {'weeks': 1}
Class TimespanPolicyYearly(ArrowBasedTimespanPolicy, TimespanPolicy, icontract._metaclass.DBC, HasEpochMixin, TimeStepsMixin, cs.deco.Promotable)
A annual time policy. PARTITION_FORMAT = 'YYYY' ARROW_SHIFT_PARAMS = {'years': 1}
Class TimeStepsMixin
Methods for an object with start and step attributes.
Class TypeCode(builtins.str, cs.deco.Promotable)
A valid array typecode with convenience methods.
Method TypeCode.__new__(cls, t):
Return a new TypeCode instance from t, which may be:
- a
str: expected to be anarraytype code int:arraytype codeq(signed 64 bit)float:arraytype coded(double float)
Function tzfor(tzspec: Union[datetime.tzinfo, str, NoneType] = None) -> datetime.tzinfo
Promote the timezone specification tzspec to a tzinfo instance.
If tzspec is an instance of tzinfo it is returned unchanged.
If tzspec is omitted or the string 'local' this returns
dateutil.tz.gettz(), the local system timezone.
Otherwise it returns dateutil.tz.gettz(tzspec).
Release Log
Release 20240316: Fixed release upload artifacts.
Release 20240201: Release with "csts" script.
Release 20230612:
- Epoch.promote: do not special case None, let Optional[Epoch] type annoations handle that.
- Mark PlotSeries.promote as incomplete (raises RuntimeError).
- TimespanPolicy.promote: use cls.from_name() instead of TimespanPolicy.from_name().
- Assorted other small updates.
Release 20230217:
- TimeSeriesFile.save_to: use atomic_filename() to create the updated file.
- Other small fixes and updates.
Release 20220918:
- TimeSeriesMapping.as_pd_dataframe: rename
keystodf_data, and accept either a time series key or a(key,series)tuple. - TimeSeriesMapping.as_pd_dataframe: default
key_map: annotate columns with their original CSV headers if present. - TimeSeriesMapping.plot: rename
keystoplot_dataas foras_pd_dataframe, addstackedandkindparameters so that we can derivekindfromstacked. - as_datetime64s: apply optional utcoffset timeshift.
- Plumb optional pad=False option through data, data2, as_pd_series.
- New PlotSeries namedtuple holding a label, a series and an extra dict as common carrier for data which will get plotted.
Release 20220805:
- Rename @plotrange to @timerange since it is not inherently associated with plotting, support both methods and functions.
- print_figure, save_figure and saved_figure now moved to cs.mplutils.
- plot_events: use the utcoffset parameter.
- TimeSeriesBaseCommand.cmd_plot: new --bare option for unadorned plots.
Release 20220626:
- New TypeCode(str) representing an array type code with associated properties and methods.
- New TimeSeriesMapping.read_csv wrapper for pandas.read_csv to import a CSV file into a TimeSeriesMapping.
- TimeSeriesFile.save,save_to: open the file for overwrite, not truncate, by default.
- TimeSeriesFile: new setitems(whens,values) method for fast batch updates.
- as_datetime64s: accept optional units parameter to trade off range versus precision.
- @plotrange: accept new optional tz/utcoffset parameters and pass the resulting utcoffset to the wrapped function along with a huge disclaimer about timezones and plots.
- New tzfor(tzspec) to return a tzinfo object from dateutil.tz.gettz, accepts 'local' for the system local default timezone.
- TimeSeriesMapping.as_pd_dataframe: accept optional utcoffset to skew the index for the DataFrame, used for time presentation in plots.
- New TimeSeriesMapping.to_csv(start,stop,f) method to write CSV data between start and stop to a file via DataFrame.to_csv.
- TimeSeriesBaseCommand: new parsetime and poptime methods, cmd_plot: update to expect start-time and optional stop-time.
Release 20220606: Initial PyPI release.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cs.timeseries-20240316.tar.gz.
File metadata
- Download URL: cs.timeseries-20240316.tar.gz
- Upload date:
- Size: 64.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7876efedfc9e26097491aa697aa53cd33d666b7c459c7572f7405d149bc13623
|
|
| MD5 |
6ec595feb541eff6f47bee65f53cb35f
|
|
| BLAKE2b-256 |
fd64f9b1156f703de0a1fb32778be8634ae17a00ee6f530187ff8d9e9e142707
|
File details
Details for the file cs.timeseries-20240316-py3-none-any.whl.
File metadata
- Download URL: cs.timeseries-20240316-py3-none-any.whl
- Upload date:
- Size: 40.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed1cc8bf42c2c7f6329e369cfa3107c6b19135f47f51224c273434a594684252
|
|
| MD5 |
74e48a15d2aaf30603288feebd8281a0
|
|
| BLAKE2b-256 |
05c987bacf2e8e2b130196abdc913214ace968a65782b08e1969a46c32407439
|