Skip to main content

A comprehensive, modular machine-learning framework that consolidates datasets, model architectures, optimization algorithms, preprocessing utilities, and evaluation tools into a unified toolkit designed for research, experimentation, and education. metaflowx streamlines end-to-end workflows—from data ingestion to model selection, training, tuning, and validation—empowering learners and researchers with a clean, extensible, and production-aware interface.

Project description

metaflowx

A modular, enterprise-grade machine-learning library engineered to streamline data workflows, accelerate model development, and operationalize end-to-end analytics pipelines. The framework is purpose-built to enable research-grade experimentation while maintaining production-level governance, reproducibility, and scale-out capability.


Overview

metaflowx is positioned as a full-stack machine-learning toolkit designed to deliver high operational efficiency across the entire data lifecycle. The package consolidates industry-standard modeling utilities, modern preprocessing pipelines, advanced decomposition operators, robust ensemble systems, and battle-tested evaluation modules.
The architecture is aligned for extensibility, maintainability, and high-performance execution, empowering both academic research and industrial ML deployments.

The solution leverages a structured module layout inspired by modern ML ecosystems, accelerating onboarding and cross-functional collaboration. Its broad coverage allows teams to build, benchmark, and operationalize ML assets with minimized technical debt and maximum throughput.

Note: This README intentionally excludes detailed documentation for the optimiser folder functions, as requested.


Key Value Proposition

  • End-to-end workflow acceleration—curated building blocks covering datasets, feature engineering, model selection, and post-training evaluation.
  • Scalable and modular architecture—clean separation of responsibilities, reusable primitives, and enterprise-friendly structure.
  • High test coverage—extensive test suites ensure deterministic execution and strong governance across ML releases.
  • Research-ready and production-aligned—balances academic flexibility with corporate-grade engineering rigor.
  • Dataset-first philosophy—built-in access to canonical datasets enables rapid prototyping and standardized benchmarking.

Project Structure

Datasets Module (datasets/)

A comprehensive library of dataset loaders, parsers, readers, and test datasets.

Capabilities

  • Native support for canonical datasets including Iris, Breast Cancer, Wine, California Housing, LFW, 20 Newsgroups, RCV1, KDDCup99, and Species Distributions.
  • Local data ingestion layer with CSV/ARFF/SVMLight support.
  • Internal OpenML readers to streamline API-free reproducibility for experimentation.
  • Optimized SVMLight parsers underpinned by Cython for throughput and reliability.

Subcomponents

  • data/: prepackaged datasets (CSV, GZ archives).
  • descr/: structured documentation describing metadata and schema for each dataset.
  • images/: bundled sample images for vision workflows.
  • tests/: dataset-level validation ensuring schema consistency and deterministic outcomes.
  • _svmlight_format_fast.pyx: Cython-accelerated parsing engine for large-scale ingestion.

The dataset hub enables fast onboarding, standardized evaluation cycles, and simulation of production-scale ingestion patterns.


Decomposition Module (decomposition/)

A full suite of dimensionality-reduction and matrix-factorization utilities engineered for speed and analytical clarity.

Operators Include:

  • PCA, Incremental PCA, Kernel PCA
  • FastICA
  • Factor Analysis
  • Sparse PCA
  • Truncated SVD
  • Non-Negative Matrix Factorization (NMF)
  • Online LDA (Latent Dirichlet Allocation)
  • Dictionary Learning
  • Coordinate Descent NMF (Cython-accelerated)

Performance Considerations

  • High-throughput Cython kernels (_cdnmf_fast.pyx, _online_lda_fast.pyx)
  • Numerical stability enhancements for large datasets
  • Memory-efficient incremental methods suitable for streaming workloads

Ideal for feature extraction, representation learning, signal separation, and topic modeling across structured and unstructured datasets.


Ensemble Module (ensemble/)

A complete ensemble-learning suite instrumented for operational robustness and model governance.

Supported Frameworks:

  • Random Forests
  • Extra Trees
  • Bagging
  • Boosting (Gradient Boosting, Histogram-Based Gradient Boosting)
  • Stacking
  • Voting
  • Isolation Forest for anomaly detection

Engineering Highlights:

  • Cython-optimized histogram-based boosting stack (_hist_gradient_boosting/)
  • Bitset-based performance enhancements for categorical splits
  • Monotonic constraint support for regulated industries (finance, healthcare)
  • Predictive pipeline fully aligned with high-volume production workloads

The ensemble stack ensures consistent, scalable model performance even in high-cardinality environments.


Feature Selection Module (feature_selection/)

A research-grade toolkit for filter, wrapper, and embedded feature-selection strategies.

Tooling Includes:

  • Variance Thresholding
  • Univariate Statistical Tests
  • Mutual Information
  • RFE / RFECV
  • Sequential Feature Selectors
  • Model-based selectors
  • Cython-driven mutual information estimators

Built for teams optimizing feature pipelines, reducing computational overhead, and enhancing interpretability.


Frozen Models Module (frozen/)

A controlled environment for immutable model artefacts used in audit-compliant ML pipelines.

This module supports freezing model states to enforce reproducibility during validation or regulatory review.


Linear Models (linear_model/)

An industrial-strength suite of regression and classification algorithms.

Coverage:

  • Ridge, Lasso, ElasticNet
  • Logistic Regression
  • Bayesian Regression
  • SGD-based solvers
  • Huber Regression
  • Quantile Regression
  • Coordinate Descent engines
  • Passive-Aggressive models
  • Least Angle Regression (LARS)
  • RANSAC Robust Regression
  • Theil–Sen Estimator

Cython integrations (_cd_fast.pyx, _sag_fast.pyx.tp, _sgd_fast.pyx.tp) provide scale-up capability for enterprise workloads.


Model Selection (model_selection/)

A holistic module for model tuning, split strategies, and performance validation.

Functional Areas:

  • Train/Validation/Test split orchestration
  • K-Fold, Stratified K-Fold, Shuffle Splits
  • Grid Search, Random Search, Successive Halving
  • Classification threshold optimization
  • Visualization utilities for diagnostic analysis (_plot.py)
  • Enhanced validation logic with deterministic behaviors

The module is designed for repeatable experimentation and strong audit trails.


Neural Network Module (neural_network/)

A lightweight neural network stack built around core feedforward architectures.

Included:

  • MLP classifiers and regressors
  • RBM (Restricted Boltzmann Machine)
  • Stochastic optimization utilities
  • Gradient-based solvers tuned for controlled training regimes

Positioned as a research accelerator rather than a deep learning framework.


Preprocessing (preprocessing/)

A scalable data-preprocessing library that minimizes friction in ETL and feature engineering pipelines.

Assets Include:

  • Label Encoding, One-Hot Encoding, Ordinal Encoding
  • Target Encoding (with Cython-accelerated fast path)
  • Polynomial Feature Expansion
  • Binning and Discretization
  • Scalable sparse matrix transformations
  • Function transformers for custom data logic

This module anchors the data engineering pipeline, enabling clean, repeatable transformations.


Support Vector Machines (svm/)

A feature-rich SVM implementation built on top of optimized C++ backends.

  • LibSVM and LibLinear integrations
  • Sparse SVM routines
  • Cython bridges for accelerated inference
  • Deterministic linear SVM solvers
  • C++ template architecture for compute-efficient training

This subsystem enables enterprise teams to deploy classical ML with predictable performance and full reproducibility.


Tests

The repository includes an extensive automated testing framework across all modules, ensuring:

  • Regression safety
  • Deterministic output behavior
  • Conformance to expected data and model interfaces
  • Compliance with enterprise release processes

CI-friendly test structure enables frictionless integration into DevOps pipelines.


Installation

Standard installation via pip:

pip install metaflowx

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metaflowx-1.1.4.tar.gz (124.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metaflowx-1.1.4-py3-none-any.whl (165.3 kB view details)

Uploaded Python 3

File details

Details for the file metaflowx-1.1.4.tar.gz.

File metadata

  • Download URL: metaflowx-1.1.4.tar.gz
  • Upload date:
  • Size: 124.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for metaflowx-1.1.4.tar.gz
Algorithm Hash digest
SHA256 44fc17212f688e1fc4386e4b00afd6ad7461707ba705969c8b78b31ea40bc140
MD5 12acce1325b1e88315305adbe83c9fcf
BLAKE2b-256 322b63e616e8daed50c33d376e9b0ec3b25e6aabf36c10145fc745af1ed26320

See more details on using hashes here.

File details

Details for the file metaflowx-1.1.4-py3-none-any.whl.

File metadata

  • Download URL: metaflowx-1.1.4-py3-none-any.whl
  • Upload date:
  • Size: 165.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for metaflowx-1.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c9105a40ce119e7c2f13accb1431dbe93acd933c0f6d48a2ef360268ea21932f
MD5 681fffbe16a8208c27e3b616e2772f89
BLAKE2b-256 a02ca57b66deb00e4ebc164bdc439a21c501549a8d109442a49c32effe3a9e29

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page