Skip to main content

An SDK to integrate cloud solutions such as SageMaker and Databricks with Hopsworks.

Project description

Downloads PypiStatus PythonVersions

hopsworks-cloud-sdk is an SDK to integrate existing cloud solutions such as Amazon SageMaker our Databricks with the Hopsworks platform.

It enables accessing the Hopsworks feature store from SageMaker and Databricks notebooks.

Quick Start

Ensure that your Hopsworks installation is set up correctly: Setting up Hopsworks for the cloud

To Install:

>>> pip install hopsworks-cloud-sdk

Sample usage:

>>> from hops import featurestore
>>> featurestore.connect('ec2-w-x-y-z.us-east-2.compute.amazonaws.com', 'my_hopsworks_project')
>>> features_df = featurestore.get_features(["my_feature_1", "my_feature_2"])

Examples

Examples for using the Cloud SDK on SageMaker

Documentation

API for the Hopsworks Feature Store

Hopsworks has a data management layer for machine learning, called a feature store. The feature store enables simple and efficient versioning, sharing, governance and definition of features that can be used to both train machine learning models or to serve inference requests. The featurestore serves as a natural interface between data engineering and data science.

API documentation

Reading from the featurestore:

from hops import featurestore
features_df = featurestore.get_features(["team_budget", "average_attendance", "average_player_age"])

Integration with Sci-kit Learn:

from hops import featurestore
train_df = featurestore.get_featuregroup("iris_features", dataframe_type="pandas")
x_df = train_df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
y_df = train_df[["label"]]
X = x_df.values
y = y_df.values.ravel()
iris_knn = KNeighborsClassifier()
iris_knn.fit(X, y)

Integration with Tensorflow:

from hops import featurestore
feature_list = ["team_budget", "average_attendance", "average_player_age",
    "team_position", "sum_attendance",
    "average_player_rating", "average_player_worth", "sum_player_age",
    "sum_player_rating", "sum_player_worth", "sum_position",
    "average_position"
  ]

latest_version = featurestore.get_latest_training_dataset_version("team_position_prediction")
featurestore.create_training_dataset(
    features = feature_list,
    training_dataset = "team_position_prediction",
    descriptive_statistics = False,
    feature_correlation = False,
    feature_histograms = False,
    cluster_analysis = False,
    training_dataset_version = latest_version + 1
)

def create_tf_dataset():
    dataset_dir = featurestore.get_training_dataset_path("team_position_prediction")
    input_files = tf.gfile.Glob(dataset_dir + "/part-r-*")
    dataset = tf.data.TFRecordDataset(input_files)
    tf_record_schema = ... # Add tf schema
    feature_names = ["team_budget", "average_attendance", "average_player_age", "sum_attendance",
         "average_player_rating", "average_player_worth", "sum_player_age", "sum_player_rating", "sum_player_worth",
         "sum_position", "average_position"
        ]
    label_name = "team_position"

    def decode(example_proto):
        example = tf.parse_single_example(example_proto, tf_record_schema)
        x = []
        for feature_name in feature_names:
            x.append(example[feature_name])
        y = [tf.cast(example[label_name], tf.float32)]
        return x,y

    dataset = dataset.map(decode).shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE).repeat(NUM_EPOCHS)
    return dataset

tf_dataset = create_tf_dataset()

Feature Visualizations:

Visualizing feature distributions
Visualizing feature correlations

Development Instructions

For development details such as how to test and build docs, see this reference: Development.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hopsworks-cloud-sdk-2.0.0.2.tar.gz (39.6 kB view details)

Uploaded Source

File details

Details for the file hopsworks-cloud-sdk-2.0.0.2.tar.gz.

File metadata

  • Download URL: hopsworks-cloud-sdk-2.0.0.2.tar.gz
  • Upload date:
  • Size: 39.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.9

File hashes

Hashes for hopsworks-cloud-sdk-2.0.0.2.tar.gz
Algorithm Hash digest
SHA256 58ffe6ad97d02c6f32657525204d2faa90cb42a85faace4bdd6a81247e04ba1d
MD5 c4634c0575e27bf52387d6636578fa6f
BLAKE2b-256 6345ac24e9b92887eb32296d9f2a471ced6e5e63d6b14cb8327b90b826f7972c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page