Skip to main content

A Django App to efficiently store dense timeseries

Project description

HoLcStore

HoLcStore is a Django app for creating a simple TimeSeries store in your database.

Table of Contents

Getting Started

  1. Add "holcstore" to your INSTALLED_APPS setting like this
INSTALLED_APPS = [
    ...,
    "holcstore",
]
  1. Define your store class using the appropriate store class.

Choose the appropriate store

Store class

This class is used to store timeseries, using a key:value pattern. A prm key is used to reference the saved series.

Handle update, replace and versionning features.

TimeseriesStore class

This class is used to store timeseries, using a user provided pattern through its model.

TimeseriesChunkStore class

This class is used to store timeseries, using a user provided pattern through its model.

Handle update and replace features.

Store timeseries by chunk for a better performance with large timeseries.

User friendly API to perform a client-server sync.

Basic Usage: Store class

This store is appropriate if you want to store using a "key - value" pattern.

Define your class in models.py

from hostore.models import Store

class YourStore(Store):
    # add new fields

    class Meta(Store.Meta):
        abstract = False
        # add your meta

Saving a timeserie to database

from path.to.YourStore import YourStore
import pandas as pd

key = "3000014324556"
client_id = 0
idx = pd.date_range('2024-01-01 00:00:00+00:00', '2024-01-02 00:00:00+00:00', freq='30min')
ds = pd.Series(data=[1]*len(idx), index=idx)
# Put the load curve to the store without versionning
YourStore.set_lc(key, ds, client_id)
# If you want to activate the versionning 
YourStore.set_lc(key, ds, client_id, versionning=True)
# Each time you call set_lc with your id, a new version will be saved in the database

Saving multiple timeseries to database

from path.to.YourStore import YourStore
import pandas as pd

key = "3000014324556"
client_id = 0
idx = pd.date_range('2024-01-01 00:00:00+00:00', '2024-01-02 00:00:00+00:00', freq='30min')
df = pd.Series(data={'key1': [1]*len(idx), 'key2': [2]*len(idx), }, index=idx)
# Put the load curve to the store without versionning
YourStore.set_many_lc(df, client_id)
# If you want to activate the versionning 
YourStore.set_many_lc(df, client_id, versionning=True)

Getting a load curve from the database

from path.to.YourStore import YourStore

key = "3000014324556"
client_id = 0
# Get the load curve from the database
# I multiple versions exists, they will be combined beginning with the version 0 and using 
# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.combine_first.html
datas = YourStore.get_lc(key, client_id)

if not datas:
    my_timeserie = datas[0]['data']
    last_updated = datas[0]['last_modified']

# If you want to retrieve all versions 
datas = YourStore.get_lc(key, client_id, combined_versions=False)
# datas contains all timeseries linked to key

# If you want to retrieve a specific version
datas = YourStore.get_lc(key, client_id, version=1)

Download series from admin

# admin.py
@admin.register(MyTimeseriesStore)
class MyTimeseriesStoreAdmin(admin.ModelAdmin):
  from hostore.admin_actions import download_timeseries_from_legacy_store
  actions = [download_timeseries_from_legacy_store]  # enable download from admin

Basic Usage: TimeseriesStore class

This store is the easiest to use, but has less features than TimeseriesChunkStore

Define your class in models.py

class MyTimeseriesStore(TimeseriesStore):
    year = models.IntegerField()
    kind = models.CharField(max_length=100)

    class Meta(TimeseriesStore.Meta):
        unique_together = ('year', 'kind')

Usage samples

# specify attributes to set
ts_attrs = dict(year=2020, kind='a')

# build a timeserie 
ds_ts = pd.Series(...)  

# set timeserie to db
MyTimeseriesStore.set_ts(ts_attrs, ds_ts)

# update existing timeserie in db (will combine ds_ts + ds_ts_v2, giving priority to ds_ts_v2 datas)
MyTimeseriesStore.set_ts(ts_attrs, ds_ts_v2, update=True)

# get timeserie from db if unique match
ds_ts = MyTimeseriesStore.get_ts(ts_attrs, flat=True)

# get timeseries from db if multiple match
datas = MyTimeseriesStore.get_ts(ts_attrs)
ds_ts1 = datas[0]['data']

# admin.py
@admin.register(MyTimeseriesStore)
class MyTimeseriesStoreAdmin(admin.ModelAdmin):
  from hostore.admin_actions import download_timeseries_from_store
  actions = [download_timeseries_from_store]  # enable download from admin

Basic usage: TimeseriesChunkStore class

Compact time-series storage for Django + PostgreSQL

TimeseriesChunkStore is an abstract Django model that lets you persist high-resolution time-series efficiently as compressed LZ4 blobs, while still querying them through the ORM.

Main features

Feature Description
Chunking Split each series by ('year',) or ('year','month') so blobs stay small.
Compression Data saved as LZ4-compressed NumPy buffers; index is rebuilt on the fly.
Dense layout Each chunk is re-indexed on a regular grid (STORE_FREQ, STORE_TZ).
Smart upsert set_ts(..., update=True) merges with existing data via combine_first.
Bulk helpers set_many_ts() / yield_many_ts() for mass insert / streaming read.
Sync ready updates_queryset, list_updates, export_chunks, import_chunks enable cheap client ↔ server replication.
REST scaffolding TimeseriesChunkStoreSyncViewSet + TimeseriesChunkStoreSyncClient give you plug-and-play API endpoints and a Python client.

Quick start

1/ Define your store class

# models.py
class MyChunkedStore(TimeseriesChunkStore):
    # Custom fields (can be seen as "axis keys")
    version = models.IntegerField()
    kind    = models.CharField(max_length=20)

    # Store settings
    CHUNK_AXIS = ('year', 'month')   # Chunking axis for timeseries storage. Configs : ('year',) / ('year', 'month')
    STORE_TZ   = 'Europe/Paris' # Chunking timezone (also timeseries output tz)
    STORE_FREQ   = '1h' # Timeseries storage frequency. (the store reindex input series but never resample)
    ALLOW_CLIENT_SERVER_SYNC = False # if True, enable the sync features
    CACHED_INDEX_SIZE = 120  # max number of date indexes kept in cache

The Custom fields are strictly indexation axis : you must not use them to store metadata or redundant data.

During django's setup, any class that inherits from TimeseriesChunkStore will have its Meta properties unique_together and indexes automatically edited, such as it contains all your keys and the mandatory keys from the abstract store.

Do not :

  • define your own Meta.unique_together or Meta.indexes.
  • define those fields : start_ts, data, dtype, updated_at, chunk_index, is_deleted.
  • edit "Store settings" : CHUNK_AXIS, STORE_TZ, STORE_FREQ, ALLOW_CLIENT_SERVER_SYNC once the table has been created through migration. This will lead to data corruption.

If you need to define a manager from a queryset ``, you need to inherits from hostore.models.chunk_timeserie_store.ChunkQuerySet

class MyChunkedStore(TimeseriesChunkStore):
    objects = MyChunkedStoreQuerySet().as_manager()

class MyChunkedStoreQuerySet(ChunkQuerySet):
    def custom_method(self, *args):
        pass

2/ Use your store class

This summarize use cases with legal and illegal usages.

# my_timeseries_usage.py
# Set one
import pandas as pd

attrs = {"version": 1, "kind": "kind1"}
MyChunkedStore.set_ts(attrs, my_series1)  # first write
MyChunkedStore.set_ts(attrs, my_series2, update=True)  # update (combine_first)
MyChunkedStore.set_ts(attrs, my_series3, replace=True)  # replace
MyChunkedStore.set_ts(attrs, my_series4)  # FAIL : attrs exists

# Get one
full = MyChunkedStore.get_ts(attrs)
window = MyChunkedStore.get_ts(attrs, start=pd.Timestamp("2024-01-01 00:00:00+01:00"), end=pd.Timestamp("2024-06-01 02:00:00+02:00"))
fail = MyChunkedStore.get_ts({"version": 1})  # FAIL : must specify all attrs
none = MyChunkedStore.get_ts({"version": 1, "kind": "nonexisting"})  # returns None : does not exists

# Set many
MyChunkedStore.set_many_ts(
  mapping={
    (5, "kind1"): my_series1,
    (5, "kind2"): my_series2,
  },
  keys=("version", "kind")
)

# Yield many
series_generator = MyChunkedStore.yield_many_ts({"version": 5})  # contains the 2 (serie, key_dict) from mapping
max_horodate = MyChunkedStore.get_max_horodate({"version": 5})  # maximum horodate of the 2 (serie, key_dict) from mapping

# CANNOT set many over existing
MyChunkedStore.set_many_ts(mapping, keys=("version", "kind"))  # FAIL : at least one value from mapping already exists

3/ Setup sync tools

Two TimeseriesChunkStore can be easily synchronized between a server and a client. Client and server models must be the same.

You must set ALLOW_CLIENT_SERVER_SYNC=True if you want to use those features. In that case please note some particular behaviour :

  • If you need to delete objects on server side, you must pass keep_tracking=True to delete method. Eg : ServerStore.objects.filter(version=1).delete(keep_tracking=True). Otherwise a ValueError will be raised.
  • You cannot use the set_ts(replace=False, update=False) or set_many_ts. Otherwise a ValueError will be raised.

3.1/ Define the ViewSet (server side : serve data)

You can set throttle, auth etc. through as_factory kwargs.

# views.py
from hostore.utils.ts_sync import TimeseriesChunkStoreSyncViewSet

YearSync = TimeseriesChunkStoreSyncViewSet.as_factory(MyChunkStoreServerSide, throttle_classes=[], **kwargs_views)  # pass ViewSet kwargs
router.register("ts/myendpoint", YearSync, basename="ts-myendpoint")

The /updates/ endpoint uses DRF's limit/offset pagination. Requests can specify limit and offset parameters and responses include standard pagination metadata:

GET /ts/myendpoint/updates/?since=2024-01-01T00:00:00Z&limit=100

{
  "count": 1234,
  "next": "/ts/myendpoint/updates/?since=2024-01-01T00:00:00Z&limit=100&offset=100",
  "previous": null,
  "results": [
    { "attrs": {"id": 1}, "chunk_index": 0, ... },
    ...
  ]
}

Results are ordered by updated_at.

3.2/ Define the API (client side : pull new data)

# my_client_sync_module.py
from hostore.utils.ts_sync import TimeseriesChunkStoreSyncClient

client = TimeseriesChunkStoreSyncClient(
    endpoint="https://api.example.com/ts/myendpoint",
    store_model=MyChunkStoreClientSide,
)
client.pull(batch=100, page_size=500)      # fetch new/updated chunks, requesting 500 updates per page

TimeseriesChunkStoreSyncClient.pull follows the next links returned by the server and continues fetching pages until there are no more results.

3.3/ Use admin features

This allows you to download a zip file containing all the time series chunk selected

# admin.py
@admin.register(MyChunkedStore)
class MyChunkedStoreAdmin(admin.ModelAdmin):
  from hostore.admin_actions import download_timeseries_from_chunkstore
  actions = [download_timeseries_from_chunkstore]

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

holcstore-0.6.3.tar.gz (51.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

holcstore-0.6.3-py3-none-any.whl (67.0 kB view details)

Uploaded Python 3

File details

Details for the file holcstore-0.6.3.tar.gz.

File metadata

  • Download URL: holcstore-0.6.3.tar.gz
  • Upload date:
  • Size: 51.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for holcstore-0.6.3.tar.gz
Algorithm Hash digest
SHA256 83035beb2110754eb46bbc08f655011cc865d2b19c47ed1b56b8c8266b3f2810
MD5 e1c501baeb965e562db501c5964007e3
BLAKE2b-256 a47bef05b092427cb4c84e58f92cfdc0199157d89df64b584c9c63f737dbc580

See more details on using hashes here.

Provenance

The following attestation bundles were made for holcstore-0.6.3.tar.gz:

Publisher: publish.yml on jpl-holmium/holcstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file holcstore-0.6.3-py3-none-any.whl.

File metadata

  • Download URL: holcstore-0.6.3-py3-none-any.whl
  • Upload date:
  • Size: 67.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for holcstore-0.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 182bcff8a3fb8a8c4ad7fe8c0387ab225b9c1e3da9ace469f81e19e854495541
MD5 45d138cd9954a57dd42d02ca6a0d9ec6
BLAKE2b-256 a8485fe52153e0062e63b8192cc0ec3370191abffec60548641164700fdedd52

See more details on using hashes here.

Provenance

The following attestation bundles were made for holcstore-0.6.3-py3-none-any.whl:

Publisher: publish.yml on jpl-holmium/holcstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page