Skip to main content

Hive support for Feast offline store

Project description

Feast Hive Support

Hive is not included in current Feast roadmap, this project intends to add Hive support for Offline Store.
For more details, can check this Feast issue.

The public releases have passed all integration tests, please create an issue if you got any problem.

Change Logs

  • DONE [v0.1.1] I am working on the first workable version, think it will be released in a couple of days.
  • DONE [v0.1.2] Allow custom hive conf when connect to a HiveServer2
  • DONE [v0.14.0] Support Feast 0.14.x
  • DONE [v0.17.0] Support Feast 0.17.0
  • TODO It currently supports insert into for uploading entity_df, which is a little inefficient, gonna add extra parameters for people who are able to provide HDFS address in next version (for uploading to HDFS).

Quickstart

Install feast

pip install feast

Install feast-hive

  • Install stable version
pip install feast-hive 
  • Install develop version (not stable):
pip install git+https://github.com/baineng/feast-hive.git 

Create a feature repository

feast init feature_repo
cd feature_repo

Edit feature_store.yaml

set offline_store type to be feast_hive.HiveOfflineStore

project: ...
registry: ...
provider: local
offline_store:
    type: feast_hive.HiveOfflineStore
    host: localhost
    port: 10000        # optional, default is `10000`
    database: default  # optional, default is `default`
    hive_conf:         # optional, hive conf overlay
      hive.join.cache.size: 14797
      hive.exec.max.dynamic.partitions: 779
    ... # other parameters
online_store:
    ...

Create Hive Table

  1. Upload data/driver_stats.parquet to HDFS
hdfs dfs -copyFromLocal ./data/driver_stats.parquet /tmp/
  1. Create Hive Table
CREATE TABLE driver_stats (
    event_timestamp   bigint,
    driver_id         bigint,
    conv_rate         float,
    acc_rate          float,
    avg_daily_trips   int,
    created           bigint
)
STORED AS PARQUET;
  1. Load data into the table
LOAD DATA INPATH '/tmp/driver_stats.parquet' INTO TABLE driver_stats;

Edit example.py

# This is an example feature definition file

from google.protobuf.duration_pb2 import Duration

from feast import Entity, Feature, FeatureView, ValueType
from feast_hive import HiveSource

# Read data from Hive table
# Here we use a Query to reuse the original parquet data, 
# but you can replace to your own Table or Query.
driver_hourly_stats = HiveSource(
    # table='driver_stats',
    query = """
    SELECT Timestamp(cast(event_timestamp / 1000000 as bigint)) AS event_timestamp, 
           driver_id, conv_rate, acc_rate, avg_daily_trips, 
           Timestamp(cast(created / 1000000 as bigint)) AS created 
    FROM driver_stats
    """,
    event_timestamp_column="event_timestamp",
    created_timestamp_column="created",
)

# Define an entity for the driver.
driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id", )

# Define FeatureView
driver_hourly_stats_view = FeatureView(
    name="driver_hourly_stats",
    entities=["driver_id"],
    ttl=Duration(seconds=86400 * 1),
    features=[
        Feature(name="conv_rate", dtype=ValueType.FLOAT),
        Feature(name="acc_rate", dtype=ValueType.FLOAT),
        Feature(name="avg_daily_trips", dtype=ValueType.INT64),
    ],
    online=True,
    batch_source=driver_hourly_stats,
    tags={},
)

Apply the feature definitions

feast apply

Generating training data and so on

The rest are as same as Feast Quickstart

Developing and Testing

Developing

git clone https://github.com/baineng/feast-hive.git
cd feast-hive
# creating virtual env ...
pip install -e ".[dev]"

# before commit
make format
make lint

Testing

pip install -e ".[test]"
pytest -n 6 --host=localhost --port=10000 --database=default

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feast-hive-0.17.0.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

feast_hive-0.17.0-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file feast-hive-0.17.0.tar.gz.

File metadata

  • Download URL: feast-hive-0.17.0.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.7.12

File hashes

Hashes for feast-hive-0.17.0.tar.gz
Algorithm Hash digest
SHA256 17ba4714b7887ada2709ba509e36ead38d3f66e88efbcb067fe85553dbc030e1
MD5 48192e41d761d45f7962f30443b60e91
BLAKE2b-256 3c1c8de72171ac6bde5514f46d1d76c5bccd69337bc9295d3f67a99c615198d8

See more details on using hashes here.

File details

Details for the file feast_hive-0.17.0-py3-none-any.whl.

File metadata

  • Download URL: feast_hive-0.17.0-py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.7.12

File hashes

Hashes for feast_hive-0.17.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d29478bee85415a4c2b6a16b39214e262827b9720c0f4d4118cf18e5c44b66f9
MD5 6b30ef583452c09b7b38cdad36343065
BLAKE2b-256 0da83dc32ff47b68e299ecb222475d09efcdcbc389c9e5f6c66463a8de2596f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page