Tools for managing pastas projects
Project description
pastastore
This module contains a tool to manage Pastas timeseries and models in a database.
The implementation is similar to pastas.Project, but in addition to managing timeseries and models in-memory, it allows storage of data in a database or on disk. Storing timeseries and models in a database gives the user a simple way to manage Pastas projects with the added bonus of allowing the user to pick upwhere they left off, without having to (re)load everything into memory.
The connection to database/disk/memory is managed by a connector object. Currently, three connectors are included. The first implementation is an in-memory connector. The other two store data in a database. Both of these implementations are designed to have fast read/write operations, while also compressing the stored data.
-
In-memory: uses dictionaries to hold timeseries and pastas Models in-memory. Does not require any additional packages to use.
-
Arctic is a timeseries/dataframe database that sits atop MongoDB. Arctic supports pandas.DataFrames.
-
PyStore is a datastore (inspired by Arctic) created for storing pandas dataframes (especially timeseries) on disk. Data is stored using fastparquet and compressed with Snappy.
Dependencies
This module has several dependencies (depending on which connector is used):
If using in-memory connector:
- No additional dependencies are required.
If using Arctic:
-
Arctic requires MongoDB, e.g. install the Community edition (Windows, MacOS).
-
OR, if you wish to use Docker for running MongoDB see the installation instructions here.
If using Pystore:
- PyStore uses Snappy, a fast and efficient compression/decompression library from Google. You'll need to install Snappy on your system before installing PyStore. See links for installation instructions here: https://github.com/ranaroussi/pystore#dependencies
Installation
Install the module by typing pip install .
from the module root directory.
Please note that pystore is not automatically installed as a dependency
because it requires Snappy to be (manually) installed first (see previous
section)!
For installing in development mode, clone the repository and install by
typing pip install -e .
from the module root directory.
Usage
The following snippets show typical usage. The general idea is to first define
the connector object. Then, the next step is to pass that connector to
PastaStore
.
Using in-memory dictionaries
import pastastore as pst
# define connector
conn = pst.DictConnector("my_connector")
# create project for managing Pastas data and models
store = pst.PastaStore("my_project", conn)
Using Arctic
import pastastore as pst
# define arctic connector
connstr = "mongodb://localhost:27017/"
conn = pst.ArcticConnector("my_connector", connstr)
# create project for managing Pastas data and models
store = pst.PastaStore("my_project", conn)
Using Pystore
import pastastore as pst
# define pystore connector
path = "./data/pystore"
conn = pst.PystoreConnector("my_connector", path)
# create project for managing Pastas data and models
store = pst.PastaStore("my_project", conn)
The database read/write/delete methods can be accessed through the reference
to the connector object. For easy access, the
most common methods are registered to the store
object. E.g.
series = store.conn.get_oseries("my_oseries")
is equivalent to:
series = store.get_oseries("my_oseries")
Types of Connectors
The structure and some background on the different types of Connectors is detailed below.
DictConnector
The DictConnector
is a very simple object that stores all
data and models in dictionaries. The data is stored in-memory and not on disk
and is therefore not persistent, i.e. you cannot pick up where you left off
last time. Once you exit Python your data is lost. For small projects, this
connector can be useful as it is extremely simple.
ArcticConnector
The ArcticConnector is an object that creates a connection with a MongoDB
database. This means there must be a running MongoDB instance available.
This can be an existing or a new database. A database is created
to hold the different datasets: observation timeseries, stresses timeseries
and models. For each of these datasets a collection or library is created.
These are named using the following convention:
<database name>.<collection name>
.
The Arctic implementation uses the following structure:
database / collections or libraries / documents
. The data is stored within
these libraries. Observations and stresses timeseries are stored as
pandas.DataFrames. Models are stored in JSON (actually binary JSON) and
do not contain the timeseries themselves. These are picked up from the
other libraries when the model is loaded from the database.
The ArcticPastas object allows the user to add different versions for datasets, which can be used to keep a history of older models for example. This functionality is still in an experimental stage.
PystoreConnector
The PystoreConnector is an object that links to a location on disk. This can either be an existing or a new Pystore. A new store is created with collections that hold the different datasets: observation timeseries, stresses timeseries, and models.
The Pystores have the following structure: store / collections / items
. The
timeseries data is stored as Dask DataFrames which can be easily converted to
pandas DataFrames. The models are stored as JSON (not including the timeseries)
in the metadata file belonging to an item. The actual data in the item is an
empty DataFrame serving as a placeholder. This slightly 'hacky' design allows
the models to be saved in a PyStore. The timeseries are picked up from their
respective stores when the model is loaded from disk.
PyStore supports so-called snapshots (which store the current state of the store) but this has not been actively implemented in this module. Pystore does not have the same versioning capabilities as Arctic.
Custom Connectors
It should be relatively straightforward to write your own custom connector
object. The pastastore.base
module contains the BaseConnector
class
that defines which methods and properties must be defined. Each Connector
object should inherit from this class. The BaseConnector
class also shows
the expected call signature for each method. Following the same call signature
should ensure that your new connector works directly with PastaStore
.
Extra keyword arguments can be added in methods in the custom class as long as
these are defined after the expected call signature as defined in the
BaseConnector
.
class MyCustomConnector(BaseConnector, ConnectorUtil):
"""Must override each method and property in BaseConnector, e.g."""
def get_oseries(self, name, progressbar=False):
# your code to get oseries from database here
pass
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pastastore-0.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa3b81691b4cef6465d2318bc33a2ffe59f0533363755d96f6d0e482127ba0f3 |
|
MD5 | e290f525419eb5a79f006ea2299c5ee2 |
|
BLAKE2b-256 | 4d4901603d26323f743214e773055910bfe2a885fe040618c1d6086a2f5925d4 |