Skip to main content

Metastore Python SDK. Feature store and data catalog for machine learning.

Project description

Releases Issues Pull requests Documentation License

Metastore

Metastore Python SDK.

Feature store and data catalog for machine learning.

Prerequisites

Installation

Production

Install package:

pip install metastore

Development

Install package:

pip install -e .[development]

Note Use the -e, --editable flag to install the package in development mode.

Note Set up a virtual environment for development.

Format source code:

autopep8 --recursive --in-place setup.py metastore/ tests/

Lint source code:

pylint setup.py metastore/ tests/

Test package:

pytest

Report test coverage:

pytest --cov --cov-fail-under 80

Note Set the --cov-fail-under flag to 80% to validate the code coverage metric.

Build documentation:

cd docs/
sphinx-build -b html metastore/ build/

Note This step will generate the API reference before building.

Usage

Create project definition

# metastore.yaml

project:
    name: 'customer_transactions'
    display_name: 'Customer transactions'
    description: 'Customer transactions feature store.'
    author: 'Metastore Developers'
    tags:
      - 'customer'
      - 'transaction'
    version: '1.0.0'
credential_store:
    type: 'local'
    path: '/path/to/.env'
metadata_store:
    type: 'file'
    path: 's3://path/to/metadata.db'
    s3_endpoint:
        type: 'secret'
        name: 'S3_ENDPOINT'
    s3_access_key:
        type: 'secret'
        name: 'S3_ACCESS_KEY'
    s3_secret_key:
        type: 'secret'
        name: 'S3_SECRET_KEY'
feature_store:
    offline_store:
        type: 'file'
        path: 's3://path/to/features/'
        s3_endpoint:
            type: 'secret'
            name: 'S3_ENDPOINT'
        s3_access_key:
            type: 'secret'
            name: 'S3_ACCESS_KEY'
        s3_secret_key:
            type: 'secret'
            name: 'S3_SECRET_KEY'
    online_store:
        type: 'redis'
        hostname:
            type: 'secret'
            name: 'REDIS_HOSTNAME'
        port:
            type: 'secret'
            name: 'REDIS_PORT'
        database:
            type: 'secret'
            name: 'REDIS_DATABASE'
        password:
            type: 'secret'
            name: 'REDIS_PASSWORD'
data_sources:
  - name: 'postgresql_data_source'
    type: 'postgresql'
    hostname:
        type: 'secret'
        name: 'POSTGRESQL_HOSTNAME'
    port:
        type: 'secret'
        name: 'POSTGRESQL_PORT'
    database:
        type: 'secret'
        name: 'POSTGRESQL_DATABASE'
    username:
        type: 'secret'
        name: 'POSTGRESQL_USERNAME'
    password:
        type: 'secret'
        name: 'POSTGRESQL_PASSWORD'

Create feature definitions

# feature_definitions.py

from datetime import timedelta

from metastore import (
    FeatureStore,
    FeatureGroup,
    Feature,
    ValueType
)


feature_store = FeatureStore(repository='/path/to/repository/')

feature_group = FeatureGroup(
    name='customer_transactions',
    record_identifiers=['customer_id'],
    event_time_feature='timestamp',
    features=[
        Feature(name='customer_id', value_type=ValueType.INTEGER),
        Feature(name='timestamp', value_type=ValueType.STRING),
        Feature(name='daily_transactions', value_type=ValueType.FLOAT),
        Feature(name='total_transactions', value_type=ValueType.FLOAT)
    ]
)

feature_store.apply(feature_group)

Ingest features

# ingest_features.py

from metastore import FeatureStore


feature_store = FeatureStore(repository='/path/to/repository/')

dataframe = feature_store.read_from_source(
    'postgresql_data_source',
    table='customer_transaction',
    index_column='customer_id',
    partitions=10
)

feature_store.ingest('customer_transactions', dataframe)

Materialize features

# materialize_features.py

from datetime import datetime, timedelta

from metastore import FeatureStore


feature_store = FeatureStore(repository='/path/to/repository/')

feature_store.materialize(
    'customer_transactions',
    end_date=datetime.utcnow(),
    expires_in=timedelta(days=1)
)

Retrieve historical features

# retrieve_historical_features.py

from datetime import datetime

import pandas as pd
from metastore import FeatureStore


feature_store = FeatureStore(repository='/path/to/repository/')

record_identifiers = pd.DataFrame({
    'customer_id': [00001],
    'timestamp': [datetime.utcnow()]
})

dataframe = feature_store.get_historical_features(
    record_identifiers=record_identifiers,
    features=[
        'customer_transactions:daily_transactions',
        'customer_transactions:total_transactions'
    ]
)

metadata = dataframe.attrs['metastore']
print(metadata)

Retrieve online features

# retrieve_online_features.py

import pandas as pd
from metastore import FeatureStore


feature_store = FeatureStore(repository='/path/to/repository/')

record_identifiers = pd.DataFrame({
    'customer_id': [00001]
})

dataframe = feature_store.get_online_features(
    record_identifiers=record_identifiers,
    features=[
        'customer_transactions:daily_transactions',
        'customer_transactions:total_transactions'
    ]
)

metadata = dataframe.attrs['metastore']
print(metadata)

Documentation

Please refer to the official Metastore Documentation.

Changelog

Changelog contains information about new features, improvements, known issues, and bug fixes in each release.

Copyright and license

Copyright (c) 2022, Metastore Developers. All rights reserved.

Project developed under a BSD-3-Clause License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

metastore-1.0.0.dev21-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file metastore-1.0.0.dev21-py3-none-any.whl.

File metadata

  • Download URL: metastore-1.0.0.dev21-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for metastore-1.0.0.dev21-py3-none-any.whl
Algorithm Hash digest
SHA256 3757e8c5f3df9c7adffb5020d3a600dd5efb849d78ee5705f01e7e595d0bed3f
MD5 a085df7fa7bf78cab1dce38af3cc6ac1
BLAKE2b-256 b999db9f62c436a9250c84364e6da4a1411c3e2eb03505fce520ae934eae95f0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page