Metastore Python SDK. Feature store and data catalog for machine learning.
Project description
Metastore
Metastore Python SDK.
Feature store and data catalog for machine learning.
Prerequisites
Installation
Production
Install package:
pip install metastore
Development
Install package:
pip install -e .[development]
Note Use the
-e, --editable
flag to install the package in development mode.
Note Set up a virtual environment for development.
Format source code:
autopep8 --recursive --in-place setup.py metastore/ tests/
Lint source code:
pylint setup.py metastore/ tests/
Test package:
pytest
Report test coverage:
pytest --cov --cov-fail-under 80
Note Set the
--cov-fail-under
flag to 80% to validate the code coverage metric.
Build documentation:
cd docs/
sphinx-build -b html metastore/ build/
Note This step will generate the API reference before building.
Usage
Create project definition
# metastore.yaml
project:
name: 'customer_transactions'
display_name: 'Customer transactions'
description: 'Customer transactions feature store.'
author: 'Metastore Developers'
tags:
- 'customer'
- 'transaction'
version: '1.0.0'
credential_store:
type: 'local'
path: '/path/to/.env'
metadata_store:
type: 'file'
path: 's3://path/to/metadata.db'
s3_endpoint:
type: 'secret'
name: 'S3_ENDPOINT'
s3_access_key:
type: 'secret'
name: 'S3_ACCESS_KEY'
s3_secret_key:
type: 'secret'
name: 'S3_SECRET_KEY'
feature_store:
offline_store:
type: 'file'
path: 's3://path/to/features/'
s3_endpoint:
type: 'secret'
name: 'S3_ENDPOINT'
s3_access_key:
type: 'secret'
name: 'S3_ACCESS_KEY'
s3_secret_key:
type: 'secret'
name: 'S3_SECRET_KEY'
online_store:
type: 'redis'
hostname:
type: 'secret'
name: 'REDIS_HOSTNAME'
port:
type: 'secret'
name: 'REDIS_PORT'
database:
type: 'secret'
name: 'REDIS_DATABASE'
password:
type: 'secret'
name: 'REDIS_PASSWORD'
data_sources:
- name: 'postgresql_data_source'
type: 'postgresql'
hostname:
type: 'secret'
name: 'POSTGRESQL_HOSTNAME'
port:
type: 'secret'
name: 'POSTGRESQL_PORT'
database:
type: 'secret'
name: 'POSTGRESQL_DATABASE'
username:
type: 'secret'
name: 'POSTGRESQL_USERNAME'
password:
type: 'secret'
name: 'POSTGRESQL_PASSWORD'
Create feature definitions
# feature_definitions.py
from datetime import timedelta
from metastore import (
FeatureStore,
FeatureGroup,
Feature,
ValueType
)
feature_store = FeatureStore(repository='/path/to/repository/')
feature_group = FeatureGroup(
name='customer_transactions',
record_identifiers=['customer_id'],
event_time_feature='timestamp',
features=[
Feature(name='customer_id', value_type=ValueType.INTEGER),
Feature(name='timestamp', value_type=ValueType.STRING),
Feature(name='daily_transactions', value_type=ValueType.FLOAT),
Feature(name='total_transactions', value_type=ValueType.FLOAT)
]
)
feature_store.apply(feature_group)
Ingest features
# ingest_features.py
from metastore import FeatureStore
feature_store = FeatureStore(repository='/path/to/repository/')
dataframe = feature_store.read_from_source(
'postgresql_data_source',
table='customer_transaction',
index_column='customer_id',
partitions=10
)
feature_store.ingest('customer_transactions', dataframe)
Materialize features
# materialize_features.py
from datetime import datetime, timedelta
from metastore import FeatureStore
feature_store = FeatureStore(repository='/path/to/repository/')
feature_store.materialize(
'customer_transactions',
end_date=datetime.utcnow(),
expires_in=timedelta(days=1)
)
Retrieve historical features
# retrieve_historical_features.py
from datetime import datetime
import pandas as pd
from metastore import FeatureStore
feature_store = FeatureStore(repository='/path/to/repository/')
record_identifiers = pd.DataFrame({
'customer_id': [00001],
'timestamp': [datetime.utcnow()]
})
dataframe = feature_store.get_historical_features(
record_identifiers=record_identifiers,
features=[
'customer_transactions:daily_transactions',
'customer_transactions:total_transactions'
]
)
metadata = dataframe.attrs['metastore']
print(metadata)
Retrieve online features
# retrieve_online_features.py
import pandas as pd
from metastore import FeatureStore
feature_store = FeatureStore(repository='/path/to/repository/')
record_identifiers = pd.DataFrame({
'customer_id': [00001]
})
dataframe = feature_store.get_online_features(
record_identifiers=record_identifiers,
features=[
'customer_transactions:daily_transactions',
'customer_transactions:total_transactions'
]
)
metadata = dataframe.attrs['metastore']
print(metadata)
Documentation
Please refer to the official Metastore Documentation.
Changelog
Changelog contains information about new features, improvements, known issues, and bug fixes in each release.
Copyright and license
Copyright (c) 2022, Metastore Developers. All rights reserved.
Project developed under a BSD-3-Clause License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for metastore-1.0.0.dev18-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02bbf5b539b472d626d723be64355e50ac5fc9026794cb565a0219080dc4cf24 |
|
MD5 | 5ec26056d8fda700896bbfb9d79c93a3 |
|
BLAKE2b-256 | 9dad630a9e9f982fd919a967884a5b091082c5b68c2a03df702f366950b1b83a |