Skip to main content

Python Client for Orion Feature Store to push/produce Model Features and get features' metadata

Project description

Orion Python Client

A lightweight Python client for interacting with Orion Feature Store. This client provides functionality for feature metadata retrieval, protobuf serialization, and Kafka integration. 🚀

This client helps in pushing ML model's features stored in offline sources (like tables, Cloud storage objects in parquet/delta format, etc) to Orion Feature Store

Key Features

  • Feature metadata retrieval
  • Protobuf serialization of feature values and produce to Apache Kafka
  • Support for features of different various data types:
    • Scalar types (FP32, FP64, Int32, Int64, UInt32, UInt64, String, Bool)
    • Vector types (Vectors of each of the above Scalar Types)
  • Kafka integration with configurable settings

📥 Installation

pip install orion-py-client==0.1.1

Prerequisites

  • Python 3.7+
  • (Optional) Apache Spark 3.0+ & spark-sql-kafka for Kafka feature push functionality

Usage

Basic Usage

from orion_py_client import OrionPyClient

# Initialize the client
client = OrionPyClient(
    features_metadata_source_url="your_features_metadata_source_url",
    job_id="your_job_id",
    job_token="your_job_token"
)

# Get feature details
(
    offline_src_type_columns,
    offline_col_to_default_values_map,
    entity_column_names
) = opy_client.get_features_details()

Push Feature Values from Offline sources to Orion via Spark -> Kafka

Supported Offline Sources

  1. Table (Hive/Delta)
  2. Parquet folder stored in Cloud Storage (AWS/GCS/ADLS)
  3. Delta folder stored in Cloud Storage (AWS/GCS/ADLS)

Refer to the examples for detailed example of how to configure a job and push the feature values

Followng is a simple flow / outline of the steps involved in above example

# create a new orion client
opy_client = OrionPyClient(features_metadata_source_url, job_id, job_token) 

# get the features details
feature_mapping, offline_col_to_default_values_map, onfs_fg_to_onfs_feat_map, onfs_fg_to_ofs_feat_map, fg_to_datatype_map, entity_label, entity_column_names = opy_client.get_features_details(fgs_to_consider)

# read the data from different sources
df = get_features_from_all_sources(spark, entity_column_names, feature_mapping, offline_col_to_default_values_map)

# serialize of protobuf binary
proto_df = opy_client.generate_df_with_protobuf_messages(df, intra_batch_size=20) 

# Produce data to kafka so that consumers write features to Orion Feature Store
opy_client.write_protobuf_df_to_kafka(proto_df, kafka_bootstrap_servers, kafka_topic, additional_options)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For support, please create an issue

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orion_py_client-0.1.14.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orion_py_client-0.1.14-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file orion_py_client-0.1.14.tar.gz.

File metadata

  • Download URL: orion_py_client-0.1.14.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for orion_py_client-0.1.14.tar.gz
Algorithm Hash digest
SHA256 ec10d04901edba2a96e67f4a8faf8fe58c8b615c94a5c9b510969729e105ae0f
MD5 e6957271647364fa22945b5e2a157c32
BLAKE2b-256 f98a8abf7348f35fb4ee31968e46a1f98620ea64d9414734d605d90e63c33bb1

See more details on using hashes here.

File details

Details for the file orion_py_client-0.1.14-py3-none-any.whl.

File metadata

File hashes

Hashes for orion_py_client-0.1.14-py3-none-any.whl
Algorithm Hash digest
SHA256 f7d7c3529c61bfabb7c816d6de9c22f0789e83b23d6847db11260b1771f6eb0d
MD5 26d61d541acd7d2bb3f9a9790cee4c7c
BLAKE2b-256 b5fc629962c9366037619af369da65df2b3a84054664ad9660c00fd70103ef86

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page