Skip to main content

Tools for managing pastas projects

Project description

Build Status Documentation Status Codacy Badge

pastastore

This module contains a tool to manage Pastas timeseries and models in a database.

The implementation is similar to pastas.Project, but in addition to managing timeseries and models in-memory, it allows storage of data in a database or on disk. Storing timeseries and models in a database gives the user a simple way to manage Pastas projects with the added bonus of allowing the user to pick upwhere they left off, without having to (re)load everything into memory.

The connection to database/disk/memory is managed by a connector object. Currently, three connectors are included. The first implementation is an in-memory connector. The other two store data in a database. Both of these implementations are designed to have fast read/write operations, while also compressing the stored data.

  • In-memory: uses dictionaries to hold timeseries and pastas Models in-memory. Does not require any additional packages to use.

  • Arctic is a timeseries/dataframe database that sits atop MongoDB. Arctic supports pandas.DataFrames.

  • PyStore is a datastore (inspired by Arctic) created for storing pandas dataframes (especially timeseries) on disk. Data is stored using fastparquet and compressed with Snappy.

Dependencies

This module has several dependencies (depending on which connector is used):

If using in-memory connector:

  • No additional dependencies are required.

If using Arctic:

  • Arctic requires MongoDB, e.g. install the Community edition (Windows, MacOS).

  • OR, if you wish to use Docker for running MongoDB see the installation instructions here.

If using Pystore:

Installation

Install the module by typing pip install . from the module root directory. Please note that pystore is not automatically installed as a dependency because it requires Snappy to be (manually) installed first (see previous section)!

For installing in development mode, clone the repository and install by typing pip install -e . from the module root directory.

Usage

The following snippets show typical usage. The general idea is to first define the connector object. Then, the next step is to pass that connector to PastaStore.

Using in-memory dictionaries

import pastastore as pst

# define connector
conn = pst.DictConnector("my_connector")

# create project for managing Pastas data and models
store = pst.PastaStore("my_project", conn)

Using Arctic

import pastastore as pst

# define arctic connector
connstr = "mongodb://localhost:27017/"
conn = pst.ArcticConnector("my_connector", connstr)

# create project for managing Pastas data and models
store = pst.PastaStore("my_project", conn)

Using Pystore

import pastastore as pst

# define pystore connector
path = "./data/pystore"
conn = pst.PystoreConnector("my_connector", path)

# create project for managing Pastas data and models
store = pst.PastaStore("my_project", conn)

The database read/write/delete methods can be accessed through the reference to the connector object. For easy access, the most common methods are registered to the store object. E.g.

series = store.conn.get_oseries("my_oseries")

is equivalent to:

series = store.get_oseries("my_oseries")

Types of Connectors

The structure and some background on the different types of Connectors is detailed below.

DictConnector

The DictConnector is a very simple object that stores all data and models in dictionaries. The data is stored in-memory and not on disk and is therefore not persistent, i.e. you cannot pick up where you left off last time. Once you exit Python your data is lost. For small projects, this connector can be useful as it is extremely simple.

ArcticConnector

The ArcticConnector is an object that creates a connection with a MongoDB database. This means there must be a running MongoDB instance available. This can be an existing or a new database. A database is created to hold the different datasets: observation timeseries, stresses timeseries and models. For each of these datasets a collection or library is created. These are named using the following convention: <database name>.<collection name>.

The Arctic implementation uses the following structure: database / collections or libraries / documents. The data is stored within these libraries. Observations and stresses timeseries are stored as pandas.DataFrames. Models are stored in JSON (actually binary JSON) and do not contain the timeseries themselves. These are picked up from the other libraries when the model is loaded from the database.

The ArcticPastas object allows the user to add different versions for datasets, which can be used to keep a history of older models for example. This functionality is still in an experimental stage.

PystoreConnector

The PystoreConnector is an object that links to a location on disk. This can either be an existing or a new Pystore. A new store is created with collections that hold the different datasets: observation timeseries, stresses timeseries, and models.

The Pystores have the following structure: store / collections / items. The timeseries data is stored as Dask DataFrames which can be easily converted to pandas DataFrames. The models are stored as JSON (not including the timeseries) in the metadata file belonging to an item. The actual data in the item is an empty DataFrame serving as a placeholder. This slightly 'hacky' design allows the models to be saved in a PyStore. The timeseries are picked up from their respective stores when the model is loaded from disk.

PyStore supports so-called snapshots (which store the current state of the store) but this has not been actively implemented in this module. Pystore does not have the same versioning capabilities as Arctic.

Custom Connectors

It should be relatively straightforward to write your own custom connector object. The pastastore.base module contains the BaseConnector class that defines which methods and properties must be defined. Each Connector object should inherit from this class. The BaseConnector class also shows the expected call signature for each method. Following the same call signature should ensure that your new connector works directly with PastaStore. Extra keyword arguments can be added in methods in the custom class as long as these are defined after the expected call signature as defined in the BaseConnector.

class MyCustomConnector(BaseConnector, ConnectorUtil):
    """Must override each method and property in BaseConnector, e.g."""

    def get_oseries(self, name, progressbar=False):
        # your code to get oseries from database here
        pass

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pastastore-0.3.1.tar.gz (21.2 kB view hashes)

Uploaded Source

Built Distribution

pastastore-0.3.1-py3-none-any.whl (21.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page