Skip to main content

No project description provided

Project description

FeatureExpress: Time-aware Feature Engineering Library

PyPI version

Overview

FeatureExpress is a groundbreaking in-memory feature engineering library designed for processing time-based event data. It is a hybrid between a feature engineering library and a feature store, aiming to address the complex challenges of dealing with temporal data in machine learning applications.

Prerelease

:warning: Alpha Release Warning: This library is currently in an alpha stage. As such, it is subject to:

  • Changes: The API is still evolving, so you can expect many breaking changes. If you depend on this library in your project, be prepared to update your code as new versions are released.
  • Performance Issues: There may be inefficiencies or other performance issues that have not yet been resolved.
  • Unstable API: Functionality might be added, changed, or removed without notice. Documentation may be incomplete or out of date.

This version is more like a pre-release, and it's primarily intended for developers who are interested in experimenting with the latest features or contributing to the project.

Why FeatureExpress?

Why Another Feature Engineering Library / Feature Store?

The necessity of this unique library grew from years of struggling with event-driven data, especially in customer interactions and recommendations. Time adds complexity, subtlety, and depth to data analysis and modeling. The challenges include:

  • Time makes everything complex.
  • Model validation becomes harder.
  • Data leaks are subtle and hard to trace.
  • SQL-like operations are prone to errors and hard to write.
  • Existing feature stores move the burden of materialization to data scientists.

Event Data for Superior Features

Event data encapsulates reality with timestamped information, and it's pivotal in creating meaningful features. Unlike other methods that often obscure temporal aspects, FeatureExpress utilizes a dedicated data structure to make the connection between events and features clearer and more explicit.

Overcoming Problems with Current Feature Stores

Current feature stores often rely on explicit materialization and caching, leading to increased complexity for data scientists. FeatureExpress adopts a declarative approach (similar to SQL) with a DSL (Domain Specific Language) to define features, allowing for a more intuitive and error-free process.

In-Memory Processing

Built in Rust and interfaced in Python, FeatureExpress leverages in-memory processing to enable:

  • Fast materialization of features.
  • Parallel computations for efficiency.
  • Flexibility to expand to more permanent storage solutions in the future.

Though the current version is limited to datasets that fit in memory, FeatureExpress's performance and robustness make it a valuable tool for data scientists and engineers working with time-series data.

Installation

You can install FeatureExpress via pip:

pip install fexpress

Features

  1. Event-Driven Design: Utilizes events as core data structures for accurate modeling.
  2. Time-Aware DSL: Introduces a SQL-like DSL for expressive and complex feature declarations.
  3. No Data Leaks: The clear separation between past and future guarantees against inadvertent data leaks.
  4. Flexible Observation Dates: Allows custom definitions of observation dates including intervals, fixed, conditional, and more.
  5. Time-based Joins: Enables complicated joins in time, like aggregations over specific periods.
  6. Optimized Performance: Implements performance tricks like partial aggregates for efficient calculations.
  7. Rich Value Representation: Accommodates various data types for broad applications.
  8. Indices and In-memory Store: Ensures optimized querying and manipulation of time-based data.

Documentation

Full documentation, including tutorials and examples, can be found at https://feature.express.

Contributing

Interested in contributing to FeatureExpress? See our CONTRIBUTING.md for guidelines on how to help!

License

FeatureExpress is under MIT. See LICENSE for more details.

Development

env VIRTUAL_ENV=$(python3 -c 'import sys; print(sys.base_prefix)') maturin develop

or

maturin develop

development (optimized code)

maturin develop --release

building Python wheel

maturin build --release -i python

This should create a wheel in target/wheels

installing Python wheel

pip install target/wheels/fexpress_rs-0.1.0-cp38-cp38-linux_x86_64.whl -U

Note that the file name can be different depending on your system.

Docker stuff

docker build -t rust-python-maturin . docker run -rm -v $(pwd)/artifacts:/app/artifacts rust-python-maturin bash -c "make python_debug_docker && make python_profile_docker"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fexpress-0.0.6.tar.gz (156.2 kB view hashes)

Uploaded Source

Built Distributions

fexpress-0.0.6-pp39-pypy39_pp73-win_amd64.whl (1.5 MB view hashes)

Uploaded PyPy Windows x86-64

fexpress-0.0.6-cp311-none-win_amd64.whl (1.5 MB view hashes)

Uploaded CPython 3.11 Windows x86-64

fexpress-0.0.6-cp311-cp311-musllinux_1_1_x86_64.whl (3.0 MB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

fexpress-0.0.6-cp311-cp311-musllinux_1_1_aarch64.whl (3.0 MB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ ARM64

fexpress-0.0.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.9 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

fexpress-0.0.6-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.9 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

fexpress-0.0.6-cp311-cp311-macosx_11_0_arm64.whl (1.7 MB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

fexpress-0.0.6-cp311-cp311-macosx_10_7_x86_64.whl (1.8 MB view hashes)

Uploaded CPython 3.11 macOS 10.7+ x86-64

fexpress-0.0.6-cp310-none-win_amd64.whl (1.5 MB view hashes)

Uploaded CPython 3.10 Windows x86-64

fexpress-0.0.6-cp310-cp310-musllinux_1_1_x86_64.whl (3.0 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

fexpress-0.0.6-cp310-cp310-musllinux_1_1_aarch64.whl (3.0 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ ARM64

fexpress-0.0.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.9 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

fexpress-0.0.6-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.9 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

fexpress-0.0.6-cp310-cp310-macosx_11_0_arm64.whl (1.7 MB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

fexpress-0.0.6-cp310-cp310-macosx_10_7_x86_64.whl (1.8 MB view hashes)

Uploaded CPython 3.10 macOS 10.7+ x86-64

fexpress-0.0.6-cp39-none-win_amd64.whl (1.5 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

fexpress-0.0.6-cp39-cp39-musllinux_1_1_x86_64.whl (3.0 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

fexpress-0.0.6-cp39-cp39-musllinux_1_1_aarch64.whl (3.0 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ ARM64

fexpress-0.0.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.9 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

fexpress-0.0.6-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.9 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

fexpress-0.0.6-cp39-cp39-macosx_11_0_arm64.whl (1.7 MB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

fexpress-0.0.6-cp39-cp39-macosx_10_7_x86_64.whl (1.8 MB view hashes)

Uploaded CPython 3.9 macOS 10.7+ x86-64

fexpress-0.0.6-cp38-none-win_amd64.whl (1.5 MB view hashes)

Uploaded CPython 3.8 Windows x86-64

fexpress-0.0.6-cp38-cp38-musllinux_1_1_x86_64.whl (3.0 MB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

fexpress-0.0.6-cp38-cp38-musllinux_1_1_aarch64.whl (3.0 MB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ ARM64

fexpress-0.0.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.9 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

fexpress-0.0.6-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.9 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

fexpress-0.0.6-cp38-cp38-macosx_11_0_arm64.whl (1.7 MB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

fexpress-0.0.6-cp38-cp38-macosx_10_7_x86_64.whl (1.8 MB view hashes)

Uploaded CPython 3.8 macOS 10.7+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page