Skip to main content

Rethinking Data and Feature Engineering

Project description

mloda

Transforming Data and Feature Engineering

Documentation PyPI version License

Tox Checked with mypy code style: ruff

mloda rethinks data and feature engineering by offering a flexible, resilient framework that adapts seamlessly to changes. It focuses on defining transformations rather than static states, facilitating smooth transitions between development phases, and reducing redundant work.

Teams can efficiently develop MVPs, scale to production, and adapt systems—all while maintaining high data quality, governance, and scalability. Get started with mloda can be found here.

mloda's plug-in system automatically selects the right plugins for each task, enabling efficient querying and processing of complex features. Learn more about the mloda API here. By defining feature dependencies, transformations, and metadata processes, mloda minimizes duplication and fosters reusability.

mloda's framework also allows plug-ins to be shared and reused through a centralized repository. This ensures consistency, reduces operational complexity, and promotes best practices. This collaborative approach significantly reduces redundant work.

Key Benefits

The benefits are not limited to the features listed below.

Feature Engineering and Data Processing

  • automated feature engineering
  • data cleaning
  • synthetic data generation
  • time travel

Data Management and Ownership

  • one data source
  • clear split roles by users, engineers and owners speaking same language

Data Quality and Security

  • data quality definitions
  • unit- and integration tests
  • secure queries

Scalability

  • switch compute framework without changing feature logic
  • multi-environment support (offline, online, migrations)

Community Engagement by Design

  • shareable plug-in ecosystem
  • fostering community

Core Components and Architecture

mloda addresses common challenges in data and feature engineering by two key components:

Plugins

  • Feature Groups: Define feature dependencies, such as creating a composite label based on features e.g. user activity, purchase history, and support interactions. Once defined, only the label needs to be requested, as dependencies are resolved automatically, simplifying processing. Learn more here.

  • Compute Frameworks: Defines the technology stack, like Spark or Pandas, along with support for different storage engines such as Parquet, Delta Lake, or PostgreSQL, to execute feature transformations and computations, ensuring efficient processing at scale. Learn more here.

  • Extenders: Automates metadata extraction processes, helping you enhance data governance, compliance, and traceability, such as analyzing how often features are used by models or analysts, or understanding where the data is coming from. Learn more here.

Core

  • Core Engine: Handles dependencies between features and computations by coordinating linking, joining, filtering, and ordering operations to ensure optimized data processing. For example, in customer segmentation, the core engine would link and filter different data sources, such as demographics, purchasing history, and online behavior, to create relevant features.

Contributing to mloda

  • We welcome contributions from the community to help us improve and expand mloda. Whether you're interested in developing plug-ins, or adding new features, your input is invaluable. Learn more here.

Frequently Asked Questions (FAQ)

If you have additional questions about mloda and how it can enhance your data and feature engineering workflow visit our FAQ section, raise an issue on our GitHub repository, or email us at mloda.info@gmail.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mloda-0.2.8.tar.gz (106.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mloda-0.2.8-py3-none-any.whl (169.4 kB view details)

Uploaded Python 3

File details

Details for the file mloda-0.2.8.tar.gz.

File metadata

  • Download URL: mloda-0.2.8.tar.gz
  • Upload date:
  • Size: 106.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for mloda-0.2.8.tar.gz
Algorithm Hash digest
SHA256 e0509d00c1b541e20f332f6a28e34641c3b0498485f17d135ff1cd258ac7eb6c
MD5 1baedf1de0d3ad709610fe2a8cf2eed7
BLAKE2b-256 8a35ee84d2e3b003c0be5d849ac03e58bab8567b9f3a5674a296f73b6c1f1f79

See more details on using hashes here.

File details

Details for the file mloda-0.2.8-py3-none-any.whl.

File metadata

  • Download URL: mloda-0.2.8-py3-none-any.whl
  • Upload date:
  • Size: 169.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for mloda-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 a1f4626d8b9c3d3c77e66d8186484766132bdb8fb9e2a57026820381e2bf586d
MD5 b79997aa18144fbcf9f7ac668fac5719
BLAKE2b-256 09228da1212cf376a3c61faad95fba5d02df772e323afd228a0a44909a9640a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page