cascade-ml

Small ML-Engineering framework

These details have not been verified by PyPI

Project links

Project description

header

ver build

Small ML Engineering framework with the aim to standardize the work with data and models, make experiments more reproducible, ML development more fast.

This project is an attempt to build such bundle of tools for ML-Engineer, certain standards and guides for workflow, a set of templates for typical tasks.

Installation

Install latest version using pip

pip install git+https://github.com/oxid15/cascade.git@main

More info on installation can be found in docs

Usage

The simplest use-case is pipeline building.

import torch
from torch.utils.data import DataLoader
import cv2

from cascade.data import Modifier, FolderDataset

# Define Dataset - an entity responsible for fetching data from source
class SpecificImageDataset(FolderDataset):
    # Since everything is held in FolderDataset and Dataset classes
    # we need to only define __geiitem__
    def __getitem__(self, index):
        name = self.names[index]
        img = cv2.imread(name)
        return img


class PreprocessModifier(Modifier):
    # Same with Modifier - only __getitem__
    def __getitem__(self, index):
        img = super().__getitem__(index)
        img = torch.Tensor(img)
        return img


ds = SpecificImageDataset('./images')
ds = PreprocessModifier(ds)

# Pass images further to train your model

Why Cascade

Cascade emerged as an attempt to bring order into messy and fast-paced ML-engineering workflow.

As a part of small AI-team I encountered typical problems for those who run a lot of fast experiments on datasets and models with no strict system, which are:

Growing number of different versions of data pipeline
Growing number of different versions of models
Folders with hundreds of models as binary artifacts with no info about what is inside
History of model's metrics is not present
Data pipelines and model trainloops are difficult to reuse
New data coming to the training stage passes without verification

This project aims to address this kind of issues by:

Making data pipelines modular, traceable and verifiable with little or no additional code
Making models more than black-box binary artifacts
Introducing tools for storing and accessing meta data, parameters and metrics

Why not other solutions

For ML-Engineering teams there are a number of tools available, which are:

These are great tools for their own purposes, however with their own weaknesses

A lot of imperative meta-code
The need to restructure your pipelines to fit in the system
No support for tracing data-pipelines
No focus on what is inside data processing scripts, only on MLOps meta-code
Difficult to manage quick experiments, prototypes

Who could find Cascade useful

Small and fast-prototyping AI-teams could use it as a tradeoff between total missingness of any ML-Engineering framework and demanding enterprise solutions.

Principles

The key principles of Cascade are:

Elegancy - ML-pipelines code should be about ML with minimum meta-code
Agility - it should be easy to build new prototypes and wrap old ones into framework
Reusability - code should have an ability to be reused in similar projects with little or no effort
Traceability - everything should have meta-data

The logo of the project is a depiction of these principles: it symbolizes modularity, standartization, information flow and is cascade-like :)

Contributing

Pull requests and issues are welcome! For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests and docs as appropriate.

Documentation

Docs are available online: Go to Cascade documentation

Structure

Cascade is divided into three main modules namely: data, models and meta

data aims to provide OOP-solution to the problem of building complex data-pipelines
models provides standardized way of dealing with ML-models, train, evaluate, save, load, etc...
meta ensures that all relevant meta info about data and models is stored anbd can be easily viewed

There is also utils which is a collection of useful Datasets and Models which are too specific to add them to the core.

License

Apache License 2.0

Versions

This project uses Semantic Versioning - https://semver.org/

footer

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.14.2

Aug 28, 2024

0.14.1

Aug 27, 2024

0.14.0

Aug 26, 2024

0.13.1

Dec 4, 2023

0.13.0

Nov 26, 2023

0.12.1

Nov 6, 2023

0.12.0

Jul 14, 2023

0.11.1

Apr 23, 2023

0.11.0

Mar 30, 2023

0.10.0

Mar 1, 2023

0.9.0

Dec 16, 2022

0.8.0

Nov 15, 2022

0.7.3

Oct 6, 2022

0.7.2

Sep 29, 2022

0.7.1

Sep 23, 2022

0.7.0

Sep 5, 2022

0.6.2

Sep 3, 2022

0.6.1

Sep 3, 2022

0.6.0

Sep 3, 2022

0.5.2

Sep 3, 2022

This version

0.5.1

Sep 3, 2022

0.5.0

Sep 3, 2022

0.4.2

Sep 3, 2022

0.4.1

Sep 3, 2022

0.4.0

Sep 3, 2022

0.3.3

May 23, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cascade-ml-0.5.1.tar.gz (34.4 kB view hashes)

Uploaded Sep 3, 2022 Source

Built Distribution

cascade_ml-0.5.1-py3-none-any.whl (68.7 kB view hashes)

Uploaded Sep 3, 2022 Python 3

Hashes for cascade-ml-0.5.1.tar.gz

Hashes for cascade-ml-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`faac471f42b7dd6a6943ebbb3f42392b1e07cc9d8c2304544d9f293834781dd8`
MD5	`6e88fa0e3d4ab686c4667e5198630cca`
BLAKE2b-256	`4d8bdcb0d643d25735c86708ead4c86f68b082309dca158285cc05296badc85f`

Hashes for cascade_ml-0.5.1-py3-none-any.whl

Hashes for cascade_ml-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`be0013379655d1ca06f02ca0d7f689ff60d0bfe4ff9146496abc380d84e0d55a`
MD5	`5e037c4ab1d3b787921154ac31d169d5`
BLAKE2b-256	`ab422ee8c26d9eb497ff1e3576c5eb2114ed7dfc7a6b8d48d78c93abb873a10f`