A comprehensive, modular machine-learning framework that consolidates datasets, model architectures, optimization algorithms, preprocessing utilities, and evaluation tools into a unified toolkit designed for research, experimentation, and education. metaflowx streamlines end-to-end workflows—from data ingestion to model selection, training, tuning, and validation—empowering learners and researchers with a clean, extensible, and production-aware interface.

These details have not been verified by PyPI

Project description

metaflowx

A modular, enterprise-grade machine-learning library engineered to streamline data workflows, accelerate model development, and operationalize end-to-end analytics pipelines. The framework is purpose-built to enable research-grade experimentation while maintaining production-level governance, reproducibility, and scale-out capability.

Overview

metaflowx is positioned as a full-stack machine-learning toolkit designed to deliver high operational efficiency across the entire data lifecycle. The package consolidates industry-standard modeling utilities, modern preprocessing pipelines, advanced decomposition operators, robust ensemble systems, and battle-tested evaluation modules.
The architecture is aligned for extensibility, maintainability, and high-performance execution, empowering both academic research and industrial ML deployments.

The solution leverages a structured module layout inspired by modern ML ecosystems, accelerating onboarding and cross-functional collaboration. Its broad coverage allows teams to build, benchmark, and operationalize ML assets with minimized technical debt and maximum throughput.

Note: This README intentionally excludes detailed documentation for the optimiser folder functions, as requested.

Key Value Proposition

End-to-end workflow acceleration—curated building blocks covering datasets, feature engineering, model selection, and post-training evaluation.
Scalable and modular architecture—clean separation of responsibilities, reusable primitives, and enterprise-friendly structure.
High test coverage—extensive test suites ensure deterministic execution and strong governance across ML releases.
Research-ready and production-aligned—balances academic flexibility with corporate-grade engineering rigor.
Dataset-first philosophy—built-in access to canonical datasets enables rapid prototyping and standardized benchmarking.

Project Structure

Datasets Module (`datasets/`)

A comprehensive library of dataset loaders, parsers, readers, and test datasets.

Capabilities

Native support for canonical datasets including Iris, Breast Cancer, Wine, California Housing, LFW, 20 Newsgroups, RCV1, KDDCup99, and Species Distributions.
Local data ingestion layer with CSV/ARFF/SVMLight support.
Internal OpenML readers to streamline API-free reproducibility for experimentation.
Optimized SVMLight parsers underpinned by Cython for throughput and reliability.

Subcomponents

data/: prepackaged datasets (CSV, GZ archives).
descr/: structured documentation describing metadata and schema for each dataset.
images/: bundled sample images for vision workflows.
tests/: dataset-level validation ensuring schema consistency and deterministic outcomes.
_svmlight_format_fast.pyx: Cython-accelerated parsing engine for large-scale ingestion.

The dataset hub enables fast onboarding, standardized evaluation cycles, and simulation of production-scale ingestion patterns.

Decomposition Module (`decomposition/`)

A full suite of dimensionality-reduction and matrix-factorization utilities engineered for speed and analytical clarity.

Operators Include:

PCA, Incremental PCA, Kernel PCA
FastICA
Factor Analysis
Sparse PCA
Truncated SVD
Non-Negative Matrix Factorization (NMF)
Online LDA (Latent Dirichlet Allocation)
Dictionary Learning
Coordinate Descent NMF (Cython-accelerated)

Performance Considerations

High-throughput Cython kernels (_cdnmf_fast.pyx, _online_lda_fast.pyx)
Numerical stability enhancements for large datasets
Memory-efficient incremental methods suitable for streaming workloads

Ideal for feature extraction, representation learning, signal separation, and topic modeling across structured and unstructured datasets.

Ensemble Module (`ensemble/`)

A complete ensemble-learning suite instrumented for operational robustness and model governance.

Supported Frameworks:

Random Forests
Extra Trees
Bagging
Boosting (Gradient Boosting, Histogram-Based Gradient Boosting)
Stacking
Voting
Isolation Forest for anomaly detection

Engineering Highlights:

Cython-optimized histogram-based boosting stack (_hist_gradient_boosting/)
Bitset-based performance enhancements for categorical splits
Monotonic constraint support for regulated industries (finance, healthcare)
Predictive pipeline fully aligned with high-volume production workloads

The ensemble stack ensures consistent, scalable model performance even in high-cardinality environments.

Feature Selection Module (`feature_selection/`)

A research-grade toolkit for filter, wrapper, and embedded feature-selection strategies.

Tooling Includes:

Variance Thresholding
Univariate Statistical Tests
Mutual Information
RFE / RFECV
Sequential Feature Selectors
Model-based selectors
Cython-driven mutual information estimators

Built for teams optimizing feature pipelines, reducing computational overhead, and enhancing interpretability.

Frozen Models Module (`frozen/`)

A controlled environment for immutable model artefacts used in audit-compliant ML pipelines.

This module supports freezing model states to enforce reproducibility during validation or regulatory review.

Linear Models (`linear_model/`)

An industrial-strength suite of regression and classification algorithms.

Coverage:

Ridge, Lasso, ElasticNet
Logistic Regression
Bayesian Regression
SGD-based solvers
Huber Regression
Quantile Regression
Coordinate Descent engines
Passive-Aggressive models
Least Angle Regression (LARS)
RANSAC Robust Regression
Theil–Sen Estimator

Cython integrations (_cd_fast.pyx, _sag_fast.pyx.tp, _sgd_fast.pyx.tp) provide scale-up capability for enterprise workloads.

Model Selection (`model_selection/`)

A holistic module for model tuning, split strategies, and performance validation.

Functional Areas:

Train/Validation/Test split orchestration
K-Fold, Stratified K-Fold, Shuffle Splits
Grid Search, Random Search, Successive Halving
Classification threshold optimization
Visualization utilities for diagnostic analysis (_plot.py)
Enhanced validation logic with deterministic behaviors

The module is designed for repeatable experimentation and strong audit trails.

Neural Network Module (`neural_network/`)

A lightweight neural network stack built around core feedforward architectures.

Included:

MLP classifiers and regressors
RBM (Restricted Boltzmann Machine)
Stochastic optimization utilities
Gradient-based solvers tuned for controlled training regimes

Positioned as a research accelerator rather than a deep learning framework.

Preprocessing (`preprocessing/`)

A scalable data-preprocessing library that minimizes friction in ETL and feature engineering pipelines.

Assets Include:

Label Encoding, One-Hot Encoding, Ordinal Encoding
Target Encoding (with Cython-accelerated fast path)
Polynomial Feature Expansion
Binning and Discretization
Scalable sparse matrix transformations
Function transformers for custom data logic

This module anchors the data engineering pipeline, enabling clean, repeatable transformations.

Support Vector Machines (`svm/`)

A feature-rich SVM implementation built on top of optimized C++ backends.

LibSVM and LibLinear integrations
Sparse SVM routines
Cython bridges for accelerated inference
Deterministic linear SVM solvers
C++ template architecture for compute-efficient training

This subsystem enables enterprise teams to deploy classical ML with predictable performance and full reproducibility.

Tests

The repository includes an extensive automated testing framework across all modules, ensuring:

Regression safety
Deterministic output behavior
Conformance to expected data and model interfaces
Compliance with enterprise release processes

CI-friendly test structure enables frictionless integration into DevOps pipelines.

Installation

Standard installation via pip:

pip install metaflowx

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.1.10

Dec 22, 2025

1.1.9

Dec 22, 2025

1.1.8

Dec 22, 2025

1.1.7

Dec 21, 2025

1.1.6

Dec 21, 2025

1.1.5

Dec 21, 2025

1.1.4

Dec 21, 2025

1.1.3

Dec 21, 2025

1.1.2

Dec 1, 2025

This version

1.1.1

Nov 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metaflowx-1.1.1.tar.gz (117.7 kB view details)

Uploaded Nov 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

metaflowx-1.1.1-py3-none-any.whl (157.8 kB view details)

Uploaded Nov 30, 2025 Python 3

File details

Details for the file metaflowx-1.1.1.tar.gz.

File metadata

Download URL: metaflowx-1.1.1.tar.gz
Upload date: Nov 30, 2025
Size: 117.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for metaflowx-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`88da2d3620fc17bf57248fb4decf853f90d713881d34a3ab877bee55ff7cd2f3`
MD5	`0a9e43d764ac4c26404a25cc98f3d66e`
BLAKE2b-256	`08cea1b98ed4df1c06fc8f615e46f66a94ac71f84fd865fc488356920e1397b1`

See more details on using hashes here.

File details

Details for the file metaflowx-1.1.1-py3-none-any.whl.

File metadata

Download URL: metaflowx-1.1.1-py3-none-any.whl
Upload date: Nov 30, 2025
Size: 157.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for metaflowx-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5a4b061215ae7216c6f3c89abf34350a220cfc87503023bfaacf430972544653`
MD5	`788b6e7ae5711095fce4260771231179`
BLAKE2b-256	`23f4c419c83070e72eeae5dd2137565c3a957bbad7c58ef6666a7e2c42efdf08`

See more details on using hashes here.

metaflowx 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

metaflowx

Overview

Key Value Proposition

Project Structure

Datasets Module (datasets/)

Capabilities

Subcomponents

Decomposition Module (decomposition/)

Operators Include:

Performance Considerations

Ensemble Module (ensemble/)

Supported Frameworks:

Engineering Highlights:

Feature Selection Module (feature_selection/)

Tooling Includes:

Frozen Models Module (frozen/)

Linear Models (linear_model/)

Coverage:

Model Selection (model_selection/)

Functional Areas:

Neural Network Module (neural_network/)

Preprocessing (preprocessing/)

Assets Include:

Support Vector Machines (svm/)

Tests

Installation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Datasets Module (`datasets/`)

Decomposition Module (`decomposition/`)

Ensemble Module (`ensemble/`)

Feature Selection Module (`feature_selection/`)

Frozen Models Module (`frozen/`)

Linear Models (`linear_model/`)

Model Selection (`model_selection/`)

Neural Network Module (`neural_network/`)

Preprocessing (`preprocessing/`)

Support Vector Machines (`svm/`)