A Django App to efficiently store dense timeseries
Project description
HoLcStore
HoLcStore is a Django app for creating a simple TimeSeries store in your database.
Table of Contents
- Getting Started
- Choose the appropriate store
- Basic Usage : Store class
- Basic Usage : TimeseriesStore class
- Basic Usage : TimeseriesChunkStore class
Getting Started
- Add "holcstore" to your INSTALLED_APPS setting like this
INSTALLED_APPS = [
...,
"holcstore",
]
- Start using the abstract model
Storeby importing it
from hostore.models import Store
class YourStore(Store):
# add new fields
class Meta(Store.Meta):
abstract = False
# add your meta
Choose the appropriate store
Store class
This class is used to store timeseries, using a key:value pattern. A prm key is used to reference the saved series.
Handle update and replace features.
TimeseriesStore class
This class is used to store timeseries, using a user provided pattern through its model.
TimeseriesChunkStore class
This class is used to store timeseries, using a user provided pattern through its model.
Handle update and replace features.
Store timeseries by chunk for a better performance with large timeseries.
User friendly API to perform a client-server sync.
Basic Usage: Store class
This store is appropriate if you want to store using a "key - value" pattern.
Saving a timeserie to database
from path.to.YourStore import YourStore
import pandas as pd
key = "3000014324556"
client_id = 0
idx = pd.date_range('2024-01-01 00:00:00+00:00', '2024-01-02 00:00:00+00:00', freq='30min')
ds = pd.Series(data=[1]*len(idx), index=idx)
# Put the load curve to the store without versionning
YourStore.set_lc(key, ds, client_id)
# If you want to activate the versionning
YourStore.set_lc(key, ds, client_id, versionning=True)
# Each time you call set_lc with your id, a new version will be saved in the database
Saving multiple timeseries to database
from path.to.YourStore import YourStore
import pandas as pd
key = "3000014324556"
client_id = 0
idx = pd.date_range('2024-01-01 00:00:00+00:00', '2024-01-02 00:00:00+00:00', freq='30min')
df = pd.Series(data={'key1': [1]*len(idx), 'key2': [2]*len(idx), }, index=idx)
# Put the load curve to the store without versionning
YourStore.set_many_lc(df, client_id)
# If you want to activate the versionning
YourStore.set_many_lc(df, client_id, versionning=True)
Getting a load curve from the database
from path.to.YourStore import YourStore
key = "3000014324556"
client_id = 0
# Get the load curve from the database
# I multiple versions exists, they will be combined beginning with the version 0 and using
# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.combine_first.html
datas = YourStore.get_lc(key, client_id)
if not datas:
my_timeserie = datas[0]['data']
last_updated = datas[0]['last_modified']
# If you want to retrieve all versions
datas = YourStore.get_lc(key, client_id, combined_versions=False)
# datas contains all timeseries linked to key
# If you want to retrieve a specific version
datas = YourStore.get_lc(key, client_id, version=1)
Basic Usage: TimeseriesStore class
This store is the easiest to use, but has less features than TimeseriesChunkStore
Define your class in models.py
class MyTimeseriesStore(TimeseriesStore):
year = models.IntegerField()
kind = models.CharField(max_length=100)
class Meta(TimeseriesStore.Meta):
unique_together = ('year', 'kind')
Usage samples
# specify attributes to set
ts_attrs = dict(year=2020, kind='a')
# build a timeserie
ds_ts = pd.Series(...)
# set timeserie to db
MyTimeseriesStore.set_ts(ts_attrs, ds_ts)
# update existing timeserie in db (will combine ds_ts + ds_ts_v2, giving priority to ds_ts_v2 datas)
MyTimeseriesStore.set_ts(ts_attrs, ds_ts_v2, update=True)
# get timeserie from db if unique match
ds_ts = MyTimeseriesStore.get_ts(ts_attrs, flat=True)
# get timeseries from db if multiple match
datas = MyTimeseriesStore.get_ts(ts_attrs)
ds_ts1 = datas[0]['data']
# admin.py
@admin.register(MyTimeseriesStore)
class MyTimeseriesStoreAdmin(admin.ModelAdmin):
from hostore.admin_actions import download_timeseries_from_store
actions = [download_timeseries_from_store]
Basic usage: TimeseriesChunkStore class
Compact time-series storage for Django + PostgreSQL
TimeseriesChunkStore is an abstract Django model that lets you persist high-resolution time-series efficiently as compressed LZ4 blobs, while still querying them through the ORM.
Main features
| Feature | Description |
|---|---|
| Chunking | Split each series by ('year',) or ('year','month') so blobs stay small. |
| Compression | Data saved as LZ4-compressed NumPy buffers; index is rebuilt on the fly. |
| Dense layout | Each chunk is re-indexed on a regular grid (STORE_FREQ, STORE_TZ). |
| Smart upsert | set_ts(..., update=True) merges with existing data via combine_first. |
| Bulk helpers | set_many_ts() / yield_many_ts() for mass insert / streaming read. |
| Sync ready | updates_queryset, list_updates, export_chunks, import_chunks enable cheap client ↔ server replication. |
| REST scaffolding | TimeseriesChunkStoreSyncViewSet + TimeseriesChunkStoreSyncClient give you plug-and-play API endpoints and a Python client. |
Quick start
1/ Define your store class
# models.py
class MyChunkedStore(TimeseriesChunkStore):
# Custom fields (can be seen as "axis keys")
version = models.IntegerField()
kind = models.CharField(max_length=20)
# Store settings
CHUNK_AXIS = ('year', 'month') # Chunking axis for timeseries storage. Configs : ('year',) / ('year', 'month')
STORE_TZ = 'Europe/Paris' # Chunking timezone (also timeseries output tz)
STORE_FREQ = '1h' # Timeseries storage frequency. (the store reindex input series but never resample)
ALLOW_CLIENT_SERVER_SYNC = False # if True, enable the sync features
CACHED_INDEX_SIZE = 120 # max number of date indexes kept in cache
The Custom fields are strictly indexation axis : you must not use them to store metadata or redundant data.
During django's setup, any class that inherits from TimeseriesChunkStore will have its Meta properties unique_together and indexes automatically edited, such as it contains all your keys and the mandatory keys from the abstract store.
Do not :
- define your own Meta.unique_together or Meta.indexes.
- define those fields :
start_ts, data, dtype, updated_at, chunk_index, is_deleted. - edit "Store settings" :
CHUNK_AXIS, STORE_TZ, STORE_FREQ, ALLOW_CLIENT_SERVER_SYNConce the table has been created through migration. This will lead to data corruption.
If you need to define a manager from a queryset ``, you need to inherits from hostore.models.chunk_timeserie_store.ChunkQuerySet
class MyChunkedStore(TimeseriesChunkStore):
objects = MyChunkedStoreQuerySet().as_manager()
class MyChunkedStoreQuerySet(ChunkQuerySet):
def custom_method(self, *args):
pass
2/ Use your store class
This summarize use cases with legal and illegal usages.
# my_timeseries_usage.py
# Set one
import pandas as pd
attrs = {"version": 1, "kind": "kind1"}
MyChunkedStore.set_ts(attrs, my_series1) # first write
MyChunkedStore.set_ts(attrs, my_series2, update=True) # update (combine_first)
MyChunkedStore.set_ts(attrs, my_series3, replace=True) # replace
MyChunkedStore.set_ts(attrs, my_series4) # FAIL : attrs exists
# Get one
full = MyChunkedStore.get_ts(attrs)
window = MyChunkedStore.get_ts(attrs, start=pd.Timestamp("2024-01-01 00:00:00+01:00"), end=pd.Timestamp("2024-06-01 02:00:00+02:00"))
fail = MyChunkedStore.get_ts({"version": 1}) # FAIL : must specify all attrs
none = MyChunkedStore.get_ts({"version": 1, "kind": "nonexisting"}) # returns None : does not exists
# Set many
MyChunkedStore.set_many_ts(
mapping={
(5, "kind1"): my_series1,
(5, "kind2"): my_series2,
},
keys=("version", "kind")
)
# Yield many
series_generator = MyChunkedStore.yield_many_ts({"version": 5}) # contains the 2 (serie, key_dict) from mapping
max_horodate = MyChunkedStore.get_max_horodate({"version": 5}) # maximum horodate of the 2 (serie, key_dict) from mapping
# CANNOT set many over existing
MyChunkedStore.set_many_ts(mapping, keys=("version", "kind")) # FAIL : at least one value from mapping already exists
3/ Setup sync tools
Two TimeseriesChunkStore can be easily synchronized between a server and a client. Client and server models must be the same.
You must set ALLOW_CLIENT_SERVER_SYNC=True if you want to use those features. In that case please note some particular behaviour :
- If you need to delete objects on server side, you must pass keep_tracking=True to delete method. Eg :
ServerStore.objects.filter(version=1).delete(keep_tracking=True). Otherwise a ValueError will be raised. - You cannot use the set_ts(replace=False, update=False) or set_many_ts. Otherwise a ValueError will be raised.
3.1/ Define the ViewSet (server side : serve data)
You can set throttle, auth etc. through as_factory kwargs.
# views.py
from hostore.utils.ts_sync import TimeseriesChunkStoreSyncViewSet
YearSync = TimeseriesChunkStoreSyncViewSet.as_factory(MyChunkStoreServerSide, throttle_classes=[], **kwargs_views) # pass ViewSet kwargs
router.register("ts/myendpoint", YearSync, basename="ts-myendpoint")
The /updates/ endpoint uses DRF's limit/offset pagination. Requests can
specify limit and offset parameters and responses include standard
pagination metadata:
GET /ts/myendpoint/updates/?since=2024-01-01T00:00:00Z&limit=100
{
"count": 1234,
"next": "/ts/myendpoint/updates/?since=2024-01-01T00:00:00Z&limit=100&offset=100",
"previous": null,
"results": [
{ "attrs": {"id": 1}, "chunk_index": 0, ... },
...
]
}
Results are ordered by updated_at.
3.2/ Define the API (client side : pull new data)
# my_client_sync_module.py
from hostore.utils.ts_sync import TimeseriesChunkStoreSyncClient
client = TimeseriesChunkStoreSyncClient(
endpoint="https://api.example.com/ts/myendpoint",
store_model=MyChunkStoreClientSide,
)
client.pull(batch=100, page_size=500) # fetch new/updated chunks, requesting 500 updates per page
TimeseriesChunkStoreSyncClient.pull follows the next links returned by
the server and continues fetching pages until there are no more results.
3.3/ Use admin features
This allows you to download a zip file containing all the time series chunk selected
# admin.py
@admin.register(MyChunkedStore)
class MyChunkedStoreAdmin(admin.ModelAdmin):
from hostore.admin_actions import download_timeseries_from_chunkstore
actions = [download_timeseries_from_chunkstore]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file holcstore-0.5.12.tar.gz.
File metadata
- Download URL: holcstore-0.5.12.tar.gz
- Upload date:
- Size: 47.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01a58b4b2d533b10d710bd17b6804423dbc0269ea07a523d7483c8b6e2069e28
|
|
| MD5 |
d8c577e662216534f45ebca8940e51ff
|
|
| BLAKE2b-256 |
9e25c1df54d45d6402ee1b2c8ba20c0a90eb8c656dee19ec9a988a76007c75aa
|
Provenance
The following attestation bundles were made for holcstore-0.5.12.tar.gz:
Publisher:
publish.yml on jpl-holmium/holcstore
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
holcstore-0.5.12.tar.gz -
Subject digest:
01a58b4b2d533b10d710bd17b6804423dbc0269ea07a523d7483c8b6e2069e28 - Sigstore transparency entry: 737919931
- Sigstore integration time:
-
Permalink:
jpl-holmium/holcstore@fc6c1d8424457278a13d00ccc6df702648a24a66 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/jpl-holmium
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fc6c1d8424457278a13d00ccc6df702648a24a66 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file holcstore-0.5.12-py3-none-any.whl.
File metadata
- Download URL: holcstore-0.5.12-py3-none-any.whl
- Upload date:
- Size: 61.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
342ec9a3c582d1793b4ef0fd022c66cddbaeb0175c0a757e3e242485db8cdafc
|
|
| MD5 |
dc10c7b028048a109d1084a3bb4eaace
|
|
| BLAKE2b-256 |
330437035231102cf64bd777b6d50f3a2295650b956ba8bca2993470ff8dd07e
|
Provenance
The following attestation bundles were made for holcstore-0.5.12-py3-none-any.whl:
Publisher:
publish.yml on jpl-holmium/holcstore
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
holcstore-0.5.12-py3-none-any.whl -
Subject digest:
342ec9a3c582d1793b4ef0fd022c66cddbaeb0175c0a757e3e242485db8cdafc - Sigstore transparency entry: 737919939
- Sigstore integration time:
-
Permalink:
jpl-holmium/holcstore@fc6c1d8424457278a13d00ccc6df702648a24a66 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/jpl-holmium
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fc6c1d8424457278a13d00ccc6df702648a24a66 -
Trigger Event:
workflow_dispatch
-
Statement type: