Skip to main content

ByteHub Timeseries Feature Store

Project description

ByteHub Feature Store

PyPI Latest Release Issues Issues Code style: black

What is a feature store?

A feature store is a data storage system for data science and machine-learning. It can store raw data and also transformed features, which can be fed straight into an ML model or training script.

Feature stores allow data scientists and engineers to be more productive by organising the flow of data into models.

The Bytehub Feature Store is designed to:

  • Be simple to use, with a Pandas-like API;
  • Require no complicated infrastructure, running on a local Python installation or in a cloud environment;
  • Be optimised towards timeseries operations, making it highly suited to applications such as those in finance, energy, forecasting; and
  • Support simple time/value data as well as complex structures, e.g. dictionaries.

It is built on Dask to support large datasets and cluster compute environments.

Quick-start

Install using pip:

pip install bytehub

Create a local SQLite feature store by running:

import bytehub as bh
import pandas as pd

fs = bh.FeatureStore()

Data lives inside namespaces within each feature store. They can be used to separate projects or environments. Create a namespace as follows:

fs.create_namespace(
    'dev', url='/tmp/featurestore/dev', description='Dev datasets'
)

Create a feature inside this namespace which will be used to store a timeseries of pre-prepared data:

fs.create_feature('dev/first-deature', description='First feature')

Finally save some data into the feature store:

dts = pd.date_range('2020-01-01', '2021-02-09')
df = pd.DataFrame({'time': dts, 'value': list(range(len(dts)))})

fs.save_dataframe(df, 'dev/first-deature')

The data is now stored, ready to be resampled, merged with other data, and fed to machine-learning models.

Roadmap

  • Tasks to automate updates to features using orchestration tools like Airflow
  • Transforms to automate feature engineering activity.

Documentation and tutorials

Stay tuned... coming soon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bytehub-0.2.2.tar.gz (11.7 kB view hashes)

Uploaded Source

Built Distribution

bytehub-0.2.2-py3-none-any.whl (23.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page