Hive support for Feast offline store
Project description
Feast Hive Support
Hive is not included in current Feast roadmap, this project intends to add Hive support for Offline Store.
For more details, can check this Feast issue.
The public releases have passed all integration tests, please create an issue if you got any problem.
Change Logs
- DONE [v0.1.1]
I am working on the first workable version, think it will be released in a couple of days. - DONE [v0.1.2]
Allow custom hive conf when connect to a HiveServer2 - DONE [v0.14.0]
Support Feast 0.14.x - DONE [v0.17.0]
Support Feast 0.17.0 - TODO It currently supports
insert into
for uploading entity_df, which is a little inefficient, gonna add extra parameters for people who are able to provide HDFS address in next version (for uploading to HDFS).
Quickstart
Install feast
pip install feast
Install feast-hive
- Install stable version
pip install feast-hive
- Install develop version (not stable):
pip install git+https://github.com/baineng/feast-hive.git
Create a feature repository
feast init feature_repo
cd feature_repo
Edit feature_store.yaml
set offline_store
type to be feast_hive.HiveOfflineStore
project: ...
registry: ...
provider: local
offline_store:
type: feast_hive.HiveOfflineStore
host: localhost
port: 10000 # optional, default is `10000`
database: default # optional, default is `default`
hive_conf: # optional, hive conf overlay
hive.join.cache.size: 14797
hive.exec.max.dynamic.partitions: 779
... # other parameters
online_store:
...
Create Hive Table
- Upload
data/driver_stats.parquet
to HDFS
hdfs dfs -copyFromLocal ./data/driver_stats.parquet /tmp/
- Create Hive Table
CREATE TABLE driver_stats (
event_timestamp bigint,
driver_id bigint,
conv_rate float,
acc_rate float,
avg_daily_trips int,
created bigint
)
STORED AS PARQUET;
- Load data into the table
LOAD DATA INPATH '/tmp/driver_stats.parquet' INTO TABLE driver_stats;
Edit example.py
# This is an example feature definition file
from google.protobuf.duration_pb2 import Duration
from feast import Entity, Feature, FeatureView, ValueType
from feast_hive import HiveSource
# Read data from Hive table
# Here we use a Query to reuse the original parquet data,
# but you can replace to your own Table or Query.
driver_hourly_stats = HiveSource(
# table='driver_stats',
query = """
SELECT Timestamp(cast(event_timestamp / 1000000 as bigint)) AS event_timestamp,
driver_id, conv_rate, acc_rate, avg_daily_trips,
Timestamp(cast(created / 1000000 as bigint)) AS created
FROM driver_stats
""",
event_timestamp_column="event_timestamp",
created_timestamp_column="created",
)
# Define an entity for the driver.
driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id", )
# Define FeatureView
driver_hourly_stats_view = FeatureView(
name="driver_hourly_stats",
entities=["driver_id"],
ttl=Duration(seconds=86400 * 1),
features=[
Feature(name="conv_rate", dtype=ValueType.FLOAT),
Feature(name="acc_rate", dtype=ValueType.FLOAT),
Feature(name="avg_daily_trips", dtype=ValueType.INT64),
],
online=True,
batch_source=driver_hourly_stats,
tags={},
)
Apply the feature definitions
feast apply
Generating training data and so on
The rest are as same as Feast Quickstart
Developing and Testing
Developing
git clone https://github.com/baineng/feast-hive.git
cd feast-hive
# creating virtual env ...
pip install -e ".[dev]"
# before commit
make format
make lint
Testing
pip install -e ".[test]"
pytest -n 6 --host=localhost --port=10000 --database=default
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
feast-hive-0.17.0.tar.gz
(18.5 kB
view details)
Built Distribution
File details
Details for the file feast-hive-0.17.0.tar.gz
.
File metadata
- Download URL: feast-hive-0.17.0.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.7.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17ba4714b7887ada2709ba509e36ead38d3f66e88efbcb067fe85553dbc030e1 |
|
MD5 | 48192e41d761d45f7962f30443b60e91 |
|
BLAKE2b-256 | 3c1c8de72171ac6bde5514f46d1d76c5bccd69337bc9295d3f67a99c615198d8 |
File details
Details for the file feast_hive-0.17.0-py3-none-any.whl
.
File metadata
- Download URL: feast_hive-0.17.0-py3-none-any.whl
- Upload date:
- Size: 17.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.7.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d29478bee85415a4c2b6a16b39214e262827b9720c0f4d4118cf18e5c44b66f9 |
|
MD5 | 6b30ef583452c09b7b38cdad36343065 |
|
BLAKE2b-256 | 0da83dc32ff47b68e299ecb222475d09efcdcbc389c9e5f6c66463a8de2596f3 |