Hopsworks Python SDK to interact with Hopsworks Platform, Feature Store, Model Registry and Model Serving
Project description
Hopsworks Client
hopsworks is the python API for interacting with a Hopsworks cluster. Don't have a Hopsworks cluster just yet? Register an account on Hopsworks Serverless and get started for free. Once connected to your project, you can:
- Insert dataframes into the online or offline Store, create training datasets or serve real-time feature vectors in the Feature Store via the Feature Store API. Already have data somewhere you want to import, checkout our Storage Connectors documentation.
- register ML models in the model registry and deploy them via model serving via the Machine Learning API.
- manage environments, executions, kafka topics and more once you deploy your own Hopsworks cluster, either on-prem or in the cloud. Hopsworks is open-source and has its own Community Edition.
Our tutorials cover a wide range of use cases and example of what you can build using Hopsworks.
Getting Started On Hopsworks
Once you created a project on Hopsworks Serverless and created a new Api Key, just use your favourite virtualenv and package manager to install the library:
pip install hopsworks
Fire up a notebook and connect to your project, you will be prompted to enter your newly created API key:
import hopsworks
project = hopsworks.login()
Access the Feature Store of your project to use as a central repository for your feature data. Use your favourite data engineering library (pandas, polars, Spark, etc...) to insert data into the Feature Store, create training datasets or serve real-time feature vectors. Want to predict likelyhood of e-scooter accidents in real-time? Here's how you can do it:
fs = project.get_feature_store()
# Write to Feature Groups
bike_ride_fg = fs.get_or_create_feature_group(
name="bike_rides",
version=1,
primary_key=["ride_id"],
event_time="activation_time",
online_enabled=True,
)
fg.insert(bike_rides_df)
# Read from Feature Views
profile_fg = fs.get_feature_group("user_profile", version=1)
bike_ride_fv = fs.get_or_create_feature_view(
name="bike_rides_view",
version=1,
query=bike_ride_fg.select_except(["ride_id"]).join(profile_fg.select(["age", "has_license"]), on="user_id")
)
bike_rides_Q1_2021_df = bike_ride_fv.get_batch_data(
start_date="2021-01-01",
end_date="2021-01-31"
)
# Create a training dataset
version, job = bike_ride_fv.create_train_test_split(
test_size=0.2,
description='Description of a dataset',
# you can have different data formats such as csv, tsv, tfrecord, parquet and others
data_format='csv'
)
# Predict the probability of accident in real-time using new data + context data
bike_ride_fv.init_serving()
while True:
new_ride_vector = poll_ride_queue()
feature_vector = bike_ride_fv.get_online_feature_vector(
{"user_id": new_ride_vector["user_id"]},
passed_features=new_ride_vector
)
accident_probability = model.predict(feature_vector)
Or you can use the Machine Learning API to register models and deploy them for serving:
mr = project.get_model_registry()
# or
ms = project.get_model_serving()
Tutorials
Need more inspiration or want to learn more about the Hopsworks platform? Check out our tutorials.
Documentation
Documentation is available at Hopsworks Documentation.
Issues
For general questions about the usage of Hopsworks and the Feature Store please open a topic on Hopsworks Community.
Please report any issue using Github issue tracking.
Contributing
If you would like to contribute to this library, please see the Contribution Guidelines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for hopsworks-3.8.0rc2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a49001e83959a264c8015c573c4e46fa45a41c498e92b30a760c56485a988e21 |
|
MD5 | 020fa57861c6ed2b337a36d1bbf1513c |
|
BLAKE2b-256 | 00b0007aff2455e89146e9fd617abb219ba05bd00bf7c2fff04add2a6b8c87d2 |