HSFS: An environment independent client to interact with the Hopsworks Featurestore

These details have not been verified by PyPI

Project links

Project description

Hopsworks Feature Store

HSFS is the library to interact with the Hopsworks Feature Store. The library makes creating new features, feature groups and training datasets easy.

The library is environment independent and can be used in two modes:

Spark mode: For data engineering jobs that create and write features into the feature store or generate training datasets. It requires a Spark environment such as the one provided in the Hopsworks platform or Databricks. In Spark mode, HSFS provides bindings both for Python and JVM languages.
Python mode: For data science jobs to explore the features available in the feature store, generate training datasets and feed them in a training pipeline. Python mode requires just a Python interpreter and can be used both in Hopsworks from Python Jobs/Jupyter Kernels, Amazon SageMaker or KubeFlow.

The library automatically configures itself based on the environment it is run. However, to connect from an external environment such as Databricks or AWS Sagemaker, additional connection information, such as host and port, is required. For more information about the setup from external environments, see the setup section.

Getting Started On Hopsworks

Instantiate a connection and get the project feature store handler

import hsfs

connection = hsfs.connection()
fs = connection.get_feature_store()

Create a new feature group

fg = fs.create_feature_group("rain",
                        version=1,
                        description="Rain features",
                        primary_key=['date', 'location_id'],
                        online_enabled=True)

fg.save(dataframe)

Upsert new data in to the feature group with time_travel_format="HUDI"".

fg.insert(upsert_df)

Retrieve commit timeline metdata of the feature group with time_travel_format="HUDI"".

fg.commit_details()

"Reading feature group as of specific point in time".

fg = fs.get_feature_group("rain", 1)
fg.read("2020-10-20 07:34:11").show()

Read updates that occurred between specified points in time.

fg = fs.get_feature_group("rain", 1)
fg.read_changes("2020-10-20 07:31:38", "2020-10-20 07:34:11").show()

Join features together

feature_join = rain_fg.select_all()
                    .join(temperature_fg.select_all(), on=["date", "location_id"])
                    .join(location_fg.select_all())
feature_join.show(5)

join feature groups that correspond to specific point in time

feature_join = rain_fg.select_all()
                    .join(temperature_fg.select_all(), on=["date", "location_id"])
                    .join(location_fg.select_all())
                    .as_of("2020-10-31")
feature_join.show(5)

join feature groups that correspond to different time

rain_fg_q = rain_fg.select_all().as_of("2020-10-20 07:41:43")
temperature_fg_q = temperature_fg.select_all().as_of("2020-10-20 07:32:33")
location_fg_q = location_fg.select_all().as_of("2020-10-20 07:33:08")
joined_features_q = rain_fg_q.join(temperature_fg_q).join(location_fg_q)

Use the query object to create a training dataset:

td = fs.create_training_dataset("rain_dataset",
                                version=1,
                                data_format="tfrecords",
                                description="A test training dataset saved in TfRecords format",
                                splits={'train': 0.7, 'test': 0.2, 'validate': 0.1})

td.save(feature_join)

A short introduction to the Scala API:

import com.logicalclocks.hsfs._
val connection = HopsworksConnection.builder().build()
val fs = connection.getFeatureStore();
val attendances_features_fg = fs.getFeatureGroup("games_features", 1);
attendances_features_fg.show(1)

You can find more examples on how to use the library in our hops-examples repository.

Documentation

Documentation is available at Hopsworks Feature Store Documentation.

Issues

For general questions about the usage of Hopsworks and the Feature Store please open a topic on Hopsworks Community.

Please report any issue using Github issue tracking.

Contributing

If you would like to contribute to this library, please see the Contribution Guidelines.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.9.0rc24 pre-release

Sep 24, 2025

3.9.0rc23 pre-release

Aug 5, 2025

3.9.0rc22 pre-release

Jul 31, 2025

3.9.0rc21 pre-release

Jul 21, 2025

3.9.0rc20 pre-release

Jul 4, 2025

3.9.0rc19 pre-release

May 27, 2025

3.9.0rc18 pre-release

May 6, 2025

3.9.0rc17 pre-release

May 2, 2025

3.9.0rc16 pre-release

Mar 26, 2025

3.9.0rc14 pre-release

Feb 20, 2025

3.9.0rc13 pre-release

Feb 14, 2025

3.9.0rc12 pre-release

Jan 28, 2025

3.9.0rc11 pre-release

Jan 28, 2025

3.9.0rc10 pre-release

Jan 28, 2025

3.9.0rc9 pre-release

Jan 24, 2025

3.9.0rc8 pre-release

Dec 16, 2024

3.9.0rc7 pre-release

Dec 10, 2024

3.9.0rc6 pre-release

Dec 6, 2024

3.9.0rc5 pre-release

Nov 14, 2024

3.9.0rc4 pre-release

Nov 1, 2024

3.9.0rc3 pre-release

Oct 31, 2024

3.9.0rc2 pre-release

Oct 30, 2024

3.9.0rc1 pre-release

Oct 15, 2024

3.9.0rc0 pre-release

Oct 8, 2024

3.9.0.dev1 pre-release

Jun 13, 2024

3.8.0rc5 pre-release

Oct 7, 2024

3.8.0rc4 pre-release

Sep 30, 2024

3.8.0rc3 pre-release

Sep 24, 2024

3.8.0rc2 pre-release

Sep 18, 2024

3.8.0rc1 pre-release

Sep 6, 2024

3.8.0rc0 pre-release

Jun 13, 2024

3.7.9

Oct 23, 2024

3.7.8

Sep 30, 2024

3.7.7

Sep 9, 2024

3.7.6

May 3, 2024

3.7.5

Apr 30, 2024

3.7.4 yanked

Apr 30, 2024

3.7.3

Apr 25, 2024

3.7.2

Apr 15, 2024

3.7.1

Apr 11, 2024

3.7.1rc0 pre-release

Mar 28, 2024

3.7.0rc8 pre-release

Mar 19, 2024

3.7.0rc7 pre-release

Mar 19, 2024

3.7.0rc6 pre-release

Mar 7, 2024

3.7.0rc5 pre-release

Mar 1, 2024

3.7.0rc4 pre-release

Feb 28, 2024

3.7.0rc3 pre-release

Feb 27, 2024

3.7.0rc2 pre-release

Feb 26, 2024

3.7.0rc1 pre-release

Feb 6, 2024

3.7.0rc0 pre-release

Feb 6, 2024

3.5.1rc2 pre-release

Jan 19, 2024

3.5.1rc1 pre-release

Jan 18, 2024

3.5.0rc6 pre-release

Jan 23, 2024

3.5.0rc5 pre-release

Jan 16, 2024

3.5.0rc4 pre-release

Dec 20, 2023

3.5.0rc3 pre-release

Dec 11, 2023

3.5.0rc2 pre-release

Dec 4, 2023

3.5.0rc1 pre-release

Nov 28, 2023

3.5.0rc0 pre-release

Nov 23, 2023

3.4.9rc0 pre-release

May 16, 2024

3.4.8

Mar 18, 2024

3.4.7

Jan 25, 2024

3.4.6

Jan 23, 2024

3.4.5

Dec 4, 2023

3.4.4

Nov 2, 2023

3.4.3

Nov 1, 2023

3.4.2

Oct 18, 2023

3.4.2rc0 pre-release

Oct 10, 2023

3.4.1rc0 pre-release

Sep 28, 2023

3.4.0rc0 pre-release

Sep 18, 2023

3.3.0rc2 pre-release

Aug 31, 2023

3.3.0rc1 pre-release

Jul 24, 2023

3.3.0rc0 pre-release

Jul 3, 2023

3.2.1

Jul 24, 2023

3.2.0

Jun 27, 2023

3.2.0rc5 pre-release

Jun 19, 2023

3.2.0rc4 pre-release

Jun 6, 2023

3.2.0rc3 pre-release

May 17, 2023

3.2.0rc2 pre-release

May 16, 2023

3.2.0rc1 pre-release

May 3, 2023

3.2.0rc0 pre-release

Apr 12, 2023

3.1.2

Jul 11, 2023

3.1.2rc1 pre-release

Jun 19, 2023

3.1.2rc0 pre-release

Jun 8, 2023

3.1.1rc6 pre-release

Jun 6, 2023

3.1.1rc5 pre-release

May 3, 2023

3.1.1rc4 pre-release

Mar 27, 2023

3.1.1rc3 pre-release

Mar 22, 2023

3.1.1rc2 pre-release

Mar 15, 2023

3.1.1rc1 pre-release

Mar 14, 2023

3.1.1rc0 pre-release

Mar 10, 2023

This version

3.1.0rc3 pre-release

Mar 2, 2023

3.1.0rc2 pre-release

Feb 16, 2023

3.1.0rc1 pre-release

Jan 31, 2023

3.1.0rc0 pre-release

Dec 21, 2022

3.0.9

Jun 19, 2023

3.0.8

Jun 14, 2023

3.0.7

May 3, 2023

3.0.6

May 1, 2023

3.0.5

Nov 29, 2022

3.0.4

Oct 3, 2022

3.0.3

Sep 26, 2022

3.0.2

Sep 14, 2022

3.0.1

Aug 3, 2022

3.0.0

Jul 18, 2022

3.0.0rc3 pre-release

Jul 5, 2022

3.0.0rc2 pre-release

Jun 17, 2022

3.0.0rc1 pre-release

Jun 17, 2022

3.0.0rc0 pre-release

Jun 17, 2022

2.5.17

Sep 29, 2022

2.5.16

Aug 17, 2022

2.5.15

Jul 21, 2022

2.5.14

Jul 14, 2022

2.5.13

Jul 12, 2022

2.5.12

Jul 12, 2022

2.5.11

Jul 12, 2022

2.5.10

Jun 16, 2022

2.5.9

May 13, 2022

2.5.8

Apr 25, 2022

2.5.7

Apr 22, 2022

2.5.6

Apr 16, 2022

2.5.5

Apr 13, 2022

2.5.4

Apr 5, 2022

2.5.3

Apr 4, 2022

2.5.2

Feb 18, 2022

2.5.1

Feb 4, 2022

2.5.0

Jan 27, 2022

2.4.13

Mar 17, 2022

2.4.12

Mar 3, 2022

2.4.11

Feb 23, 2022

2.4.10

Feb 9, 2022

2.4.9

Jan 20, 2022

2.4.8

Jan 18, 2022

2.4.7

Dec 2, 2021

2.4.6

Nov 29, 2021

2.4.5

Nov 7, 2021

2.4.4

Nov 4, 2021

2.4.3

Nov 3, 2021

2.4.2

Oct 26, 2021

2.4.1

Oct 25, 2021

2.4.0

Oct 11, 2021

2.3.8

Nov 24, 2021

2.3.7

Nov 3, 2021

2.3.6

Sep 23, 2021

2.3.5

Sep 16, 2021

2.3.4

Sep 13, 2021

2.3.3

Aug 27, 2021

2.3.2

Jul 30, 2021

2.3.1

Jul 19, 2021

2.3.0

Jul 13, 2021

2.2.22

Dec 14, 2021

2.2.21

Oct 26, 2021

2.2.20

Sep 1, 2021

2.2.19

Aug 27, 2021

2.2.18

Aug 23, 2021

2.2.17

Jul 30, 2021

2.2.16

Jun 28, 2021

2.2.15

May 24, 2021

2.2.14

May 20, 2021

2.2.13

May 20, 2021

2.2.12

May 19, 2021

2.2.11

May 14, 2021

2.2.10

May 11, 2021

2.2.9

May 11, 2021

2.2.8

May 10, 2021

2.2.7

May 5, 2021

2.2.6

Apr 23, 2021

2.2.5

Apr 23, 2021

2.2.4

Apr 21, 2021

2.2.3

Apr 19, 2021

2.2.2

Apr 14, 2021

2.2.1

Apr 14, 2021

2.2.0

Apr 9, 2021

2.1.8

May 3, 2021

2.1.7

Mar 31, 2021

2.1.6

Mar 11, 2021

2.1.5

Feb 19, 2021

2.1.4

Feb 16, 2021

2.1.3

Feb 9, 2021

2.1.2

Feb 1, 2021

2.1.1

Feb 1, 2021

2.1.0

Jan 28, 2021

2.0.12

Feb 16, 2021

2.0.11

Feb 16, 2021

2.0.10

Feb 1, 2021

2.0.9

Feb 1, 2021

2.0.8

Jan 19, 2021

2.0.7

Jan 6, 2021

2.0.6

Dec 9, 2020

2.0.5

Dec 1, 2020

2.0.4

Nov 18, 2020

2.0.3

Nov 18, 2020

2.0.2

Nov 14, 2020

2.0.1

Nov 11, 2020

2.0.0

Nov 11, 2020

1.4.0

Sep 12, 2020

0.0.0.2

Apr 23, 2020

0.0.0.1 yanked

Apr 23, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hsfs-3.1.0rc3.tar.gz (150.0 kB view details)

Uploaded Mar 2, 2023 Source

File details

Details for the file hsfs-3.1.0rc3.tar.gz.

File metadata

Download URL: hsfs-3.1.0rc3.tar.gz
Upload date: Mar 2, 2023
Size: 150.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for hsfs-3.1.0rc3.tar.gz
Algorithm	Hash digest
SHA256	`38021110e9776aa529fdbc6190a4962ccfea33e689618fbe693206b85640916e`
MD5	`936a3da0e99789c0646a228a2923b6ff`
BLAKE2b-256	`2a9b6e4ec4816b0b54fb73cc531c02742217f4691027d8db2d1ee0643a948f41`

See more details on using hashes here.

hsfs 3.1.0rc3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Hopsworks Feature Store

Getting Started On Hopsworks

Documentation

Issues

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes