Skip to main content

Async flows

Project description

Storey

CI

Storey is an asynchronous streaming library, for real time event processing and feature extraction.

In This Document

▶ For more information, see the Storey Python package documentation.

API Walkthrough

A Storey flow consist of steps linked together by the build_flow function, each doing it's designated work.

Supported Steps

Input Steps

  • Source
  • AsyncSource
  • ReadCSV
  • ReadParquet
  • DataframeSource

Processing Steps

  • Filter
  • Map
  • FlatMap
  • MapWithState
  • Batch(max_events, timeout) - Batches events. This step emits a batch every max_events events, or when timeout seconds have passed since the first event in the batch was received.
  • Choice
  • JoinWithV3IOTable
  • SendToHttp
  • AggregateByKey(aggregations,cache, key=None, emit_policy=EmitEveryEvent(), augmentation_fn=None) - This step aggregates the data into the cache object provided for later persistence, and outputs an event enriched with the requested aggregation features.
  • QueryByKey(features, cache, key=None, augmentation_fn=None, aliases=None) - Similar to to AggregateByKey, but this step is for serving only and does not aggregate the event.
  • WriteToTable(table) - Persists the data in table to its associated storage by key.
  • Extend
  • JoinWithTable

Output Steps

  • Complete
  • Reduce
  • WriteToV3IOStream
  • WriteToCSV
  • ReduceToDataFrame
  • WriteToTSDB
  • WriteToParquet

Usage Examples

Using Aggregates

The following example reads user data, creates features using Storey's aggregates, persists the data to V3IO and emits events containing the features to a V3IO Stream for further processing.

from storey import build_flow, Source, Table, V3ioDriver, AggregateByKey, FieldAggregator, WriteToTable
from storey.dtypes import SlidingWindows

v3io_web_api = 'https://webapi.change-me.com'
v3io_acceess_key = '1284ne83-i262-46m6-9a23-810n41f169ea'
table_object = Table('/projects/my_features', V3ioDriver(v3io_web_api, v3io_acceess_key))

def enrich(event, state):
    if 'first_activity' not in state:
        state['first_activity'] = event.time
    event.body['time_since_activity'] = (event.time - state['first_activity']).seconds
    state['last_event'] = event.time
    event.body['total_activities'] = state['total_activities'] = state.get('total_activities', 0) + 1
    return event, state

controller = build_flow([
    Source(),
    MapWithState(table_object, enrich, group_by_key=True, full_event=True),
    AggregateByKey([FieldAggregator("number_of_clicks", "click", ["count"],
                                    SlidingWindows(['1h','2h', '24h'], '10m')),
                    FieldAggregator("purchases", "purchase_amount", ["avg", "min", "max"],
                                    SlidingWindows(['1h','2h', '24h'], '10m')),
                    FieldAggregator("failed_activities", "activity", ["count"],
                                    SlidingWindows(['1h'], '10m'),
                                    aggr_filter=lambda element: element['activity_status'] == 'fail'))],
                   table_object),
    WriteToTable(table_object),
    WriteToV3IOStream(V3ioDriver(v3io_web_api, v3io_acceess_key), 'features_stream')
]).run()

We can also create a serving function, which sole purpose is to read data from the feature store and emit it further

controller = build_flow([
    Source(),
    QueryAggregationByKey([FieldAggregator("number_of_clicks", "click", ["count"],
                                           SlidingWindows(['1h','2h', '24h'], '10m')),
                           FieldAggregator("purchases", "purchase_amount", ["avg", "min", "max"],
                                           SlidingWindows(['1h','2h', '24h'], '10m')),
                           FieldAggregator("failed_activities", "activity", ["count"],
                                           SlidingWindows(['1h'], '10m'),
                                           aggr_filter=lambda element: element['activity_status'] == 'fail'))],
                           table_object)
]).run()

Project details


Release history Release notifications | RSS feed

This version

0.4.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

storey-0.4.0.tar.gz (83.3 kB view details)

Uploaded Source

Built Distribution

storey-0.4.0-py3-none-any.whl (92.9 kB view details)

Uploaded Python 3

File details

Details for the file storey-0.4.0.tar.gz.

File metadata

  • Download URL: storey-0.4.0.tar.gz
  • Upload date:
  • Size: 83.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.3

File hashes

Hashes for storey-0.4.0.tar.gz
Algorithm Hash digest
SHA256 07b0fe6437d40346b24407756946c2b695f8e22b5d66a39cfe18d86964f832f6
MD5 786feb04b609864ad2a0f2b2820a28e6
BLAKE2b-256 5991ce4953a2b992b4d83fce4e6bd9fe223990046606efcfda73dfe5da21e0a1

See more details on using hashes here.

File details

Details for the file storey-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: storey-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 92.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.3

File hashes

Hashes for storey-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d8e90c51c3f8b7ab5208aeaa3eef179a264d31c3ec893a9aa541f19c0bbccc1f
MD5 59a19d837ee73a185d72230a270425cd
BLAKE2b-256 38f1a893cc5ef9a19f5ee82b9f336606002b9f57f5c6f8d0dd4c71447726416a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page