Skip to main content

A Django App to efficiently store dense timeseries

Project description

HoLcStore

HoLcStore is a Django app for creating a simple TimeSeries store in your database.

Table of Contents

Getting Started

  1. Add "holcstore" to your INSTALLED_APPS setting like this
INSTALLED_APPS = [
    ...,
    "holcstore",
]
  1. Define your store class using the appropriate store class.

Choose the appropriate store

Store class

This class is used to store timeseries, using a key:value pattern. A prm key is used to reference the saved series.

Handle update, replace and versionning features.

TimeseriesStore class

This class is used to store timeseries, using a user provided pattern through its model.

TimeseriesChunkStore class

This class is used to store timeseries, using a user provided pattern through its model.

Handle update and replace features.

Store timeseries by chunk for a better performance with large timeseries.

User friendly API to perform a client-server sync.

Basic Usage: Store class

This store is appropriate if you want to store using a "key - value" pattern.

Define your class in models.py

from hostore.models import Store

class YourStore(Store):
    # add new fields

    class Meta(Store.Meta):
        abstract = False
        # add your meta

Saving a timeserie to database

from path.to.YourStore import YourStore
import pandas as pd

key = "3000014324556"
client_id = 0
idx = pd.date_range('2024-01-01 00:00:00+00:00', '2024-01-02 00:00:00+00:00', freq='30min')
ds = pd.Series(data=[1]*len(idx), index=idx)
# Put the load curve to the store without versionning
YourStore.set_lc(key, ds, client_id)
# If you want to activate the versionning 
YourStore.set_lc(key, ds, client_id, versionning=True)
# Each time you call set_lc with your id, a new version will be saved in the database

Saving multiple timeseries to database

from path.to.YourStore import YourStore
import pandas as pd

key = "3000014324556"
client_id = 0
idx = pd.date_range('2024-01-01 00:00:00+00:00', '2024-01-02 00:00:00+00:00', freq='30min')
df = pd.Series(data={'key1': [1]*len(idx), 'key2': [2]*len(idx), }, index=idx)
# Put the load curve to the store without versionning
YourStore.set_many_lc(df, client_id)
# If you want to activate the versionning 
YourStore.set_many_lc(df, client_id, versionning=True)

Getting a load curve from the database

from path.to.YourStore import YourStore

key = "3000014324556"
client_id = 0
# Get the load curve from the database
# I multiple versions exists, they will be combined beginning with the version 0 and using 
# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.combine_first.html
datas = YourStore.get_lc(key, client_id)

if not datas:
    my_timeserie = datas[0]['data']
    last_updated = datas[0]['last_modified']

# If you want to retrieve all versions 
datas = YourStore.get_lc(key, client_id, combined_versions=False)
# datas contains all timeseries linked to key

# If you want to retrieve a specific version
datas = YourStore.get_lc(key, client_id, version=1)

Download series from admin

# admin.py
@admin.register(MyTimeseriesStore)
class MyTimeseriesStoreAdmin(admin.ModelAdmin):
  from hostore.admin_actions import download_timeseries_from_legacy_store
  actions = [download_timeseries_from_legacy_store]  # enable download from admin

Basic Usage: TimeseriesStore class

This store is the easiest to use, but has less features than TimeseriesChunkStore

Define your class in models.py

class MyTimeseriesStore(TimeseriesStore):
    year = models.IntegerField()
    kind = models.CharField(max_length=100)

    class Meta(TimeseriesStore.Meta):
        unique_together = ('year', 'kind')

Usage samples

# specify attributes to set
ts_attrs = dict(year=2020, kind='a')

# build a timeserie 
ds_ts = pd.Series(...)  

# set timeserie to db
MyTimeseriesStore.set_ts(ts_attrs, ds_ts)

# update existing timeserie in db (will combine ds_ts + ds_ts_v2, giving priority to ds_ts_v2 datas)
MyTimeseriesStore.set_ts(ts_attrs, ds_ts_v2, update=True)

# get timeserie from db if unique match
ds_ts = MyTimeseriesStore.get_ts(ts_attrs, flat=True)

# get timeseries from db if multiple match
datas = MyTimeseriesStore.get_ts(ts_attrs)
ds_ts1 = datas[0]['data']

# admin.py
@admin.register(MyTimeseriesStore)
class MyTimeseriesStoreAdmin(admin.ModelAdmin):
  from hostore.admin_actions import download_timeseries_from_store
  actions = [download_timeseries_from_store]  # enable download from admin

Basic usage: TimeseriesChunkStore class

Compact time-series storage for Django + PostgreSQL

TimeseriesChunkStore is an abstract Django model that lets you persist high-resolution time-series efficiently as compressed LZ4 blobs, while still querying them through the ORM.

Main features

Feature Description
Chunking Split each series by ('year',) or ('year','month') so blobs stay small.
Compression Data saved as LZ4-compressed NumPy buffers; index is rebuilt on the fly.
Dense layout Each chunk is re-indexed on a regular grid (STORE_FREQ, STORE_TZ).
Smart upsert set_ts(..., update=True) merges with existing data via combine_first.
Bulk helpers set_many_ts() / yield_many_ts() for mass insert / streaming read.
Sync ready updates_queryset, list_updates, export_chunks, import_chunks enable cheap client ↔ server replication.
REST scaffolding TimeseriesChunkStoreSyncViewSet + TimeseriesChunkStoreSyncClient give you plug-and-play API endpoints and a Python client.

Quick start

1/ Define your store class

# models.py
class MyChunkedStore(TimeseriesChunkStore):
    # Custom fields (can be seen as "axis keys")
    version = models.IntegerField()
    kind    = models.CharField(max_length=20)

    # Store settings
    CHUNK_AXIS = ('year', 'month')   # Chunking axis for timeseries storage. Configs : ('year',) / ('year', 'month')
    STORE_TZ   = 'Europe/Paris' # Chunking timezone (also timeseries output tz)
    STORE_FREQ   = '1h' # Timeseries storage frequency. (the store reindex input series but never resample)
    ALLOW_CLIENT_SERVER_SYNC = False # if True, enable the sync features
    CACHED_INDEX_SIZE = 120  # max number of date indexes kept in cache

The Custom fields are strictly indexation axis : you must not use them to store metadata or redundant data.

During django's setup, any class that inherits from TimeseriesChunkStore will have its Meta properties unique_together and indexes automatically edited, such as it contains all your keys and the mandatory keys from the abstract store.

Do not :

  • define your own Meta.unique_together or Meta.indexes.
  • define those fields : start_ts, data, dtype, updated_at, chunk_index, is_deleted.
  • edit "Store settings" : CHUNK_AXIS, STORE_TZ, STORE_FREQ, ALLOW_CLIENT_SERVER_SYNC once the table has been created through migration. This will lead to data corruption.

If you need to define a manager from a queryset ``, you need to inherits from hostore.models.chunk_timeserie_store.ChunkQuerySet

class MyChunkedStore(TimeseriesChunkStore):
    objects = MyChunkedStoreQuerySet().as_manager()

class MyChunkedStoreQuerySet(ChunkQuerySet):
    def custom_method(self, *args):
        pass

2/ Use your store class

This summarize use cases with legal and illegal usages.

# my_timeseries_usage.py
# Set one
import pandas as pd

attrs = {"version": 1, "kind": "kind1"}
MyChunkedStore.set_ts(attrs, my_series1)  # first write
MyChunkedStore.set_ts(attrs, my_series2, update=True)  # update (combine_first)
MyChunkedStore.set_ts(attrs, my_series3, replace=True)  # replace
MyChunkedStore.set_ts(attrs, my_series4)  # FAIL : attrs exists

# Get one
full = MyChunkedStore.get_ts(attrs)
window = MyChunkedStore.get_ts(attrs, start=pd.Timestamp("2024-01-01 00:00:00+01:00"), end=pd.Timestamp("2024-06-01 02:00:00+02:00"))
fail = MyChunkedStore.get_ts({"version": 1})  # FAIL : must specify all attrs
none = MyChunkedStore.get_ts({"version": 1, "kind": "nonexisting"})  # returns None : does not exists

# Set many
MyChunkedStore.set_many_ts(
  mapping={
    (5, "kind1"): my_series1,
    (5, "kind2"): my_series2,
  },
  keys=("version", "kind")
)

# Yield many
series_generator = MyChunkedStore.yield_many_ts({"version": 5})  # contains the 2 (serie, key_dict) from mapping
max_horodate = MyChunkedStore.get_max_horodate({"version": 5})  # maximum horodate of the 2 (serie, key_dict) from mapping

# CANNOT set many over existing
MyChunkedStore.set_many_ts(mapping, keys=("version", "kind"))  # FAIL : at least one value from mapping already exists

3/ Setup sync tools

Two TimeseriesChunkStore can be easily synchronized between a server and a client. Client and server models must be the same.

You must set ALLOW_CLIENT_SERVER_SYNC=True if you want to use those features. In that case please note some particular behaviour :

  • If you need to delete objects on server side, you must pass keep_tracking=True to delete method. Eg : ServerStore.objects.filter(version=1).delete(keep_tracking=True). Otherwise a ValueError will be raised.
  • You cannot use the set_ts(replace=False, update=False) or set_many_ts. Otherwise a ValueError will be raised.

3.1/ Define the ViewSet (server side : serve data)

You can set throttle, auth etc. through as_factory kwargs.

# views.py
from hostore.utils.ts_sync import TimeseriesChunkStoreSyncViewSet

YearSync = TimeseriesChunkStoreSyncViewSet.as_factory(MyChunkStoreServerSide, throttle_classes=[], **kwargs_views)  # pass ViewSet kwargs
router.register("ts/myendpoint", YearSync, basename="ts-myendpoint")

The /updates/ endpoint uses DRF's limit/offset pagination. Requests can specify limit and offset parameters and responses include standard pagination metadata:

GET /ts/myendpoint/updates/?since=2024-01-01T00:00:00Z&limit=100

{
  "count": 1234,
  "next": "/ts/myendpoint/updates/?since=2024-01-01T00:00:00Z&limit=100&offset=100",
  "previous": null,
  "results": [
    { "attrs": {"id": 1}, "chunk_index": 0, ... },
    ...
  ]
}

Results are ordered by updated_at.

3.2/ Define the API (client side : pull new data)

# my_client_sync_module.py
from hostore.utils.ts_sync import TimeseriesChunkStoreSyncClient

client = TimeseriesChunkStoreSyncClient(
    endpoint="https://api.example.com/ts/myendpoint",
    store_model=MyChunkStoreClientSide,
)
client.pull(batch=100, page_size=500)      # fetch new/updated chunks, requesting 500 updates per page

TimeseriesChunkStoreSyncClient.pull follows the next links returned by the server and continues fetching pages until there are no more results.

3.3/ Use admin features

This allows you to download a zip file containing all the time series chunk selected

# admin.py
@admin.register(MyChunkedStore)
class MyChunkedStoreAdmin(admin.ModelAdmin):
  from hostore.admin_actions import download_timeseries_from_chunkstore
  actions = [download_timeseries_from_chunkstore]

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

holcstore-0.6.5.tar.gz (51.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

holcstore-0.6.5-py3-none-any.whl (67.1 kB view details)

Uploaded Python 3

File details

Details for the file holcstore-0.6.5.tar.gz.

File metadata

  • Download URL: holcstore-0.6.5.tar.gz
  • Upload date:
  • Size: 51.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for holcstore-0.6.5.tar.gz
Algorithm Hash digest
SHA256 40e6fe3be3ff0130edd8f8a8926636affbc6d64ec0290781c30c5ce0880fddf4
MD5 d21299ebe5a0515facf7b15155cd7c75
BLAKE2b-256 79d4501c0e361a19bb902db627f7f7f83d4906c497176eadc71eaac8f9283560

See more details on using hashes here.

Provenance

The following attestation bundles were made for holcstore-0.6.5.tar.gz:

Publisher: publish.yml on jpl-holmium/holcstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file holcstore-0.6.5-py3-none-any.whl.

File metadata

  • Download URL: holcstore-0.6.5-py3-none-any.whl
  • Upload date:
  • Size: 67.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for holcstore-0.6.5-py3-none-any.whl
Algorithm Hash digest
SHA256 df60f5aee770ee7497f590238fab0604d39e91f2d9e5a4f22d2a9d6d16730330
MD5 6fd4cdcec74f8aed909956a4457597ef
BLAKE2b-256 8c59ca3bb57f702bcf074874beca84dc685762721e907a4a330b5f681ee498a8

See more details on using hashes here.

Provenance

The following attestation bundles were made for holcstore-0.6.5-py3-none-any.whl:

Publisher: publish.yml on jpl-holmium/holcstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page