Skip to main content

Metastore Python SDK. Feature store and data catalog for machine learning.

Project description

Releases Issues Pull requests Documentation License

Metastore

Metastore Python SDK.

Feature store and data catalog for machine learning.

Prerequisites

Installation

Production

Install package:

pip install metastore

Development

Install package:

pip install -e .[development]

Note Use the -e, --editable flag to install the package in development mode.

Note Set up a virtual environment for development.

Format source code:

autopep8 --recursive --in-place setup.py metastore/ tests/

Lint source code:

pylint setup.py metastore/ tests/

Test package:

pytest

Report test coverage:

pytest --cov --cov-fail-under 80

Note Set the --cov-fail-under flag to 80% to validate the code coverage metric.

Generate documentation:

sphinx-apidoc -f -e -T -d 2 -o docs/metastore/api-reference/ metastore/

Build documentation (optional):

cd docs/
sphinx-build -b html metastore/ build/

Usage

Create project definition

# metastore.yaml

project:
    name: 'customer_transactions'
    display_name: 'Customer transactions'
    description: 'Customer transactions feature store.'
    author: 'Metastore Developers'
    tags:
      - 'customer'
      - 'transaction'
    version: '1.0.0'
credential_store:
    type: 'local'
    path: '/path/to/.env'
metadata_store:
    type: 'file'
    path: 's3://path/to/metadata.db'
    s3_endpoint:
        type: 'secret'
        name: 'S3_ENDPOINT'
    s3_access_key:
        type: 'secret'
        name: 'S3_ACCESS_KEY'
    s3_secret_key:
        type: 'secret'
        name: 'S3_SECRET_KEY'
feature_store:
    offline_store:
        type: 'file'
        path: 's3://path/to/features/'
        s3_endpoint:
            type: 'secret'
            name: 'S3_ENDPOINT'
        s3_access_key:
            type: 'secret'
            name: 'S3_ACCESS_KEY'
        s3_secret_key:
            type: 'secret'
            name: 'S3_SECRET_KEY'
    online_store:
        type: 'redis'
        hostname:
            type: 'secret'
            name: 'REDIS_HOSTNAME'
        port:
            type: 'secret'
            name: 'REDIS_PORT'
        database:
            type: 'secret'
            name: 'REDIS_DATABASE'
        password:
            type: 'secret'
            name: 'REDIS_PASSWORD'
data_sources:
  - name: 'customer_transactions'
    type: 'postgresql'
    table: 'public.customer_transaction'
    hostname:
        type: 'secret'
        name: 'POSTGRESQL_HOSTNAME'
    port:
        type: 'secret'
        name: 'POSTGRESQL_PORT'
    database:
        type: 'secret'
        name: 'POSTGRESQL_DATABASE'
    username:
        type: 'secret'
        name: 'POSTGRESQL_USERNAME'
    password:
        type: 'secret'
        name: 'POSTGRESQL_PASSWORD'

Create feature definitions

# feature_definitions.py

from datetime import timedelta

from metastore import (
    FeatureStore,
    FeatureGroup,
    Feature,
    ValueType
)


feature_store = FeatureStore(repository='/path/to/repository/')

customer_transactions_feature_group = FeatureGroup(
    name='customer_transactions',
    record_identifiers=['customer_id'],
    event_time_feature='timestamp',
    features=[
        Feature(name='customer_id', value_type=ValueType.INTEGER),
        Feature(name='timestamp', value_type=ValueType.STRING),
        Feature(name='daily_transactions', value_type=ValueType.FLOAT),
        Feature(name='total_transactions', value_type=ValueType.FLOAT)
    ]
)

feature_store.apply(customer_transactions_feature_group)

Ingest features

# ingest_features.py

from metastore import FeatureStore


feature_store = FeatureStore(repository='/path/to/repository/')

dataframe = feature_store.read_dataframe(
    'customer_transactions',
    index_column='customer_id',
    partitions=100
)

feature_store.ingest('customer_transactions', dataframe)

Materialize features

# materialize_features.py

from datetime import datetime, timedelta

from metastore import FeatureStore


feature_store = FeatureStore(repository='/path/to/repository/')

feature_store.materialize(
    'customer_transactions',
    end_date=datetime.utcnow(),
    expires_in=timedelta(days=1)
)

Retrieve historical features

# retrieve_historical_features.py

from datetime import datetime

import pandas as pd
from metastore import FeatureStore


feature_store = FeatureStore(repository='/path/to/repository/')

record_identifiers = pd.DataFrame({
    'customer_id': [00001],
    'timestamp': [datetime.utcnow()]
})

dataframe = feature_store.get_historical_features(
    record_identifiers=record_identifiers,
    features=[
        'customer_transactions:daily_transactions',
        'customer_transactions:total_transactions'
    ]
).compute()

metadata = dataframe.attrs['metastore']
print(metadata)

Retrieve online features

# retrieve_online_features.py

import pandas as pd
from metastore import FeatureStore


feature_store = FeatureStore(repository='/path/to/repository/')

record_identifiers = pd.DataFrame({
    'customer_id': [00001]
})

dataframe = feature_store.get_online_features(
    record_identifiers=record_identifiers,
    features=[
        'customer_transactions:daily_transactions',
        'customer_transactions:total_transactions'
    ]
).compute()

metadata = dataframe.attrs['metastore']
print(metadata)

Documentation

Please refer to the official Metastore Documentation.

Changelog

Changelog contains information about new features, improvements, known issues, and bug fixes in each release.

Copyright and license

Copyright (c) 2022, Metastore Developers. All rights reserved.

Project developed under a BSD-3-Clause License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

metastore-1.0.0.dev12-py3-none-any.whl (8.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page