Skip to main content

retrain-pipelines lowers the barrier to entry for the creation and management of professional machine learning retraining pipelines.

Project description

uder_construction

This README is nowhere near ready yet.

logo_large

retrain-pipelines simplifies the creation and management of machine learning retraining pipelines. The package is designed to remove the complexity of building end-to-end ML retraining pipelines, allowing users to focus on their data and model-architecture. With pre-built, highly adaptable pipeline examples that work out of the box, users can easily integrate their own data and begin retraining models with minimal-to-no setup.

Key features of retrain-pipelines include:

  • Model version blessing: Automatically compare the performance of retrained models against previous best versions to ensure only superior models are deployed.
  • Infrastructure validation: Each retraining pipeline includes inference pipeline packaging, local Docker container deployment, and request/response validation to ensure that models are production-ready.
  • Comprehensive documentation: Every retraining pipeline is fully documented with sections covering Exploratory Data Analysis (EDA), hyperparameter tuning, retraining steps, model performance metrics, and key commands for retrieving training artifacts. Additionally, DAG information for the retraining process is readily available for pipeline transparency and debugging.

In essence, retrain-pipelines offers a seamless solution: "Come with your data, and it works," with the added benefit of flexibility for more advanced users to adjust and extend pipelines as needed.

Customizability & Adaptability

retrain-pipelines offers a high degree of flexibility, allowing users to tailor the pre-shipped pipelines to their specific needs:

  • Custom Preprocessing Functions: Users can provide their own Python functions for custom data preprocessing. For example, some built-in pipelines for tabular data allow optional bucketization of numerical features by name, but you can easily modify or extend these preprocessing steps to suit your dataset and feature requirements.
  • Custom Pipeline Card Generation: You can specify custom Python functions to generate pipeline cards, such as including specific performance charts or metrics relevant to your use case.
  • Custom HTML Templates: For further personalization, retrain-pipelines supports customizable HTML templates, enabling you to adjust formatting, insert specific charts, change page colors, or even add your company's logo to documentation pages.

retrain-pipelines doesn't just streamline the retraining process, it empowers teams to innovate faster, iterate smarter, and deploy more robust models with confidence. Whether you're looking for an out-of-the-box solution or a highly customizable pipeline, retrain-pipelines is your ultimate companion for continuous model improvement.

-- DRAFT --

- Say you use custom "preprocessing.py", "pipeline_card.py" and/or "template.html".
  If you chose to log the run on WandB, you can retrieve the versionned artifacts there afterwards via the WandB inspector "name_here" retrain-pipelines offers.

- incl. link to pypi here https://pypi.org/project/retrain-pipelines/

- all is fine to track your draft pipelines as you iterate on developping them, but keeping tracks of the artifacts generated during those dry runs on the other hand has no value. To address that and all the "..." that come with it, we propose sandboxing.
  Stateful yet ephemeral. Once your happy with a given ML retraining pipeline advancement, you're free to drop all the draft artifacts.

launch tests

pytest -s tests

build from source

cd pkg_src && python -m build --verbose

install from source (dev mode) via :

pip install -e pkg_src

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retrain_pipelines-0.1.0.tar.gz (143.4 kB view details)

Uploaded Source

Built Distribution

retrain_pipelines-0.1.0-py3-none-any.whl (154.1 kB view details)

Uploaded Python 3

File details

Details for the file retrain_pipelines-0.1.0.tar.gz.

File metadata

  • Download URL: retrain_pipelines-0.1.0.tar.gz
  • Upload date:
  • Size: 143.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for retrain_pipelines-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f0c90e108c8c22ca6992cfc37f18c59a5eb877ae8833e96baa720fc67b6e94f8
MD5 7aba12b525360fc28757d29a5b22118b
BLAKE2b-256 5c94860d2f2a069c9f73cb896c4251c3cb9309631d574dcd464c3b510ba59502

See more details on using hashes here.

File details

Details for the file retrain_pipelines-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for retrain_pipelines-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7bc7cfdef10816c66599a0a578c1d1ae84e25b07e6e1780e39d1100ea2264104
MD5 ad01e1cb68d82651edf93acf3b03cba7
BLAKE2b-256 286ba8feda8dafa98876fdebc5a49fd46a8ba25375d70662a3fcb2efde9a5ad5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page