An SDK to integrate cloud solutions such as SageMaker and Databricks with Hopsworks.
Project description
hopsworks-cloud-sdk is an SDK to integrate existing cloud solutions such as Amazon SageMaker our Databricks with the Hopsworks platform.
It enables accessing the Hopsworks feature store from SageMaker and Databricks notebooks.
Quick Start
Ensure that your Hopsworks installation is set up correctly: Setting up Hopsworks for the cloud
To Install:
>>> pip install hopsworks-cloud-sdk
Sample usage:
>>> from hops import featurestore >>> featurestore.connect('ec2-w-x-y-z.us-east-2.compute.amazonaws.com', 'my_hopsworks_project') >>> features_df = featurestore.get_features(["my_feature_1", "my_feature_2"])
Examples
Documentation
API for the Hopsworks Feature Store
Hopsworks has a data management layer for machine learning, called a feature store. The feature store enables simple and efficient versioning, sharing, governance and definition of features that can be used to both train machine learning models or to serve inference requests. The featurestore serves as a natural interface between data engineering and data science.
Reading from the featurestore:
from hops import featurestore
features_df = featurestore.get_features(["team_budget", "average_attendance", "average_player_age"])
Integration with Sci-kit Learn:
from hops import featurestore
train_df = featurestore.get_featuregroup("iris_features", dataframe_type="pandas")
x_df = train_df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
y_df = train_df[["label"]]
X = x_df.values
y = y_df.values.ravel()
iris_knn = KNeighborsClassifier()
iris_knn.fit(X, y)
Integration with Tensorflow:
from hops import featurestore
feature_list = ["team_budget", "average_attendance", "average_player_age",
"team_position", "sum_attendance",
"average_player_rating", "average_player_worth", "sum_player_age",
"sum_player_rating", "sum_player_worth", "sum_position",
"average_position"
]
latest_version = featurestore.get_latest_training_dataset_version("team_position_prediction")
featurestore.create_training_dataset(
features = feature_list,
training_dataset = "team_position_prediction",
descriptive_statistics = False,
feature_correlation = False,
feature_histograms = False,
cluster_analysis = False,
training_dataset_version = latest_version + 1
)
def create_tf_dataset():
dataset_dir = featurestore.get_training_dataset_path("team_position_prediction")
input_files = tf.gfile.Glob(dataset_dir + "/part-r-*")
dataset = tf.data.TFRecordDataset(input_files)
tf_record_schema = ... # Add tf schema
feature_names = ["team_budget", "average_attendance", "average_player_age", "sum_attendance",
"average_player_rating", "average_player_worth", "sum_player_age", "sum_player_rating", "sum_player_worth",
"sum_position", "average_position"
]
label_name = "team_position"
def decode(example_proto):
example = tf.parse_single_example(example_proto, tf_record_schema)
x = []
for feature_name in feature_names:
x.append(example[feature_name])
y = [tf.cast(example[label_name], tf.float32)]
return x,y
dataset = dataset.map(decode).shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE).repeat(NUM_EPOCHS)
return dataset
tf_dataset = create_tf_dataset()
Feature Visualizations:
Development Instructions
For development details such as how to test and build docs, see this reference: Development.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file hopsworks-cloud-sdk-2.0.0.2.tar.gz
.
File metadata
- Download URL: hopsworks-cloud-sdk-2.0.0.2.tar.gz
- Upload date:
- Size: 39.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 58ffe6ad97d02c6f32657525204d2faa90cb42a85faace4bdd6a81247e04ba1d |
|
MD5 | c4634c0575e27bf52387d6636578fa6f |
|
BLAKE2b-256 | 6345ac24e9b92887eb32296d9f2a471ced6e5e63d6b14cb8327b90b826f7972c |