Skip to main content

ML-Engineering library

Project description

header

ver build Downloads DOI

Lightweight and modular MLOps library with the aim to make ML development more efficient targeted at small teams or individuals.

Cascade was built especially for individuals or small teams that are in need of MLOps, but don't have time or resources to integrate with platforms.

Included in Model Lifecycle section of Awesome MLOps list

Installation

pip install cascade-ml

More info on installation can be found in documentation

Docs

Go to Cascade documentation

Usage Examples

This section is divided into blocks based on what problem you can solve using Cascade. These are the simplest examples of what the library is capable of. See more in documentation.

ETL pipeline tracking

Data processing pipelines need to be versioned and tracked as a part of model experiments.
To track changes and version everything about data Cascade has Datasets - special wrappers that encapsulate operations on data.

from pprint import pprint
from cascade import data as cdd
from sklearn.datasets import load_digits
import numpy as np


X, y = load_digits(return_X_y=True)
pairs = [(x, y) for (x, y) in zip(X, y)]

ds = cdd.Wrapper(pairs)
ds = cdd.RandomSampler(ds)

train_ds, test_ds = cdd.split(ds)
train_ds = cdd.ApplyModifier(
    train_ds,
    lambda pair: pair + np.random.random() * 0.1 - 0.05
)

pprint(train_ds.get_meta())

We see all the stages that we did in meta.

Click to see full pipeline metadata
[{"comments": [],
  "description": null,
  "len": 898,
  "links": [],
  "name": "cascade.data.apply_modifier.ApplyModifier",
  "tags": [],
  "type": "dataset"},
 {"comments": [],
  "description": null,
  "len": 898,
  "links": [],
  "name": "cascade.data.range_sampler.RangeSampler",
  "tags": [],
  "type": "dataset"},
 {"comments": [],
  "description": null,
  "len": 1797,
  "links": [],
  "name": "cascade.data.random_sampler.RandomSampler",
  "tags": [],
  "type": "dataset"},
 {"comments": [],
  "description": null,
  "len": 1797,
  "links": [],
  "name": "cascade.data.dataset.Wrapper",
  "obj_type": "<class 'list'>",
  "tags": [],
  "type": "dataset"}]

See all datasets in zoo
See tutorial in documentation

Experiment tracking

Cascade provides a rich set of ML-experiment tracking tools. You can easily track history of model changes, save and restore models in a structured manner along with metadata.

import random
from cascade.models import Model
from cascade.repos import Repo

model = Model()
model.add_metric('acc', random.random())

repo = Repo('./repo')

line = repo.add_line('baseline')
line.save(model, only_meta=True)

Repo is the collection of lines and Line can be a bunch of experiments on one model type. Lines can also store data pipelines.

Click to see full model metadata
[
    {
        "name": "cascade.models.model.Model",
        "description": null,
        "tags": [],
        "comments": [],
        "links": [],
        "type": "model",
        "created_at": "2024-08-25T19:15:24.658259+00:00",
        "metrics": [
            {
                "name": "acc",
                "value": 0.4323295098641783,
                "created_at": "2024-08-25T19:15:24.658356+00:00"
            }
        ],
        "params": {},
        "path": "/home/user/repo/baseline/00000",
        "slug": "rustling_finicky_hoatzin",
        "saved_at": "2024-08-25T19:15:25.548339+00:00",
        "python_version": "3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]",
        "user": "user",
        "host": "hostname"
    }
]

See tutorial in documentation

Metadata analysis

During experiments Cascade produces many metadata which can be analyzed later. MetricViewer is the tool that allows to see the relationship between parameters and metrics of all models in repository.

from cascade.meta import MetricViewer
from cascade.repos import Repo

repo = cdm.Repo("repo")

# This runs web-server that relies on optional dependency
MetricViewer(repo).serve()

metric-viewer

HistoryViewer allows to see model's lineage, what parameters resulted in what metrics

from cascade import meta as cme
from cascade.repos import Repo


repo = cdm.Repo("repo")

# This returns plotly figure
cme.HistoryViewer(repo).plot()

# This runs a dash server and allows to see changes in real time (for example while models are trained)
cme.HistoryViewer(repo).serve()

See tutorial in documentation

history-viewer

Who could find Cascade useful

ML engineers and researchers in small teams or working individually. The price of integrating with large-scale MLOps solutions can be too high and the aim of Cascade is to bridge this gap for everyone.

Principles

The key principles of Cascade are:

  • Elegancy - ML code should be about ML with minimum meta-code
  • Flexibility - to easily build prototypes and integrate existing projects with Cascade (don't pay for what you don't use)
  • Reusability - code to be reused in similar projects with no effort
  • Traceability - everything should have meta-data

Contributing

Pull requests and issues are welcome! For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests and docs as appropriate.

License

Apache License 2.0

Versions

This project uses Semantic Versioning - https://semver.org/

Cite the code

If you used the code in your research, please cite it with:

DOI

@software{ilia_moiseev_2023_8006995,
  author       = {Ilia Moiseev},
  title        = {Oxid15/cascade: Lightweight ML Engineering library},
  month        = jun,
  year         = 2023,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.8006995},
  url          = {https://doi.org/10.5281/zenodo.8006995}
}

footer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cascade_ml-0.14.2.tar.gz (90.5 kB view details)

Uploaded Source

Built Distribution

cascade_ml-0.14.2-py3-none-any.whl (160.5 kB view details)

Uploaded Python 3

File details

Details for the file cascade_ml-0.14.2.tar.gz.

File metadata

  • Download URL: cascade_ml-0.14.2.tar.gz
  • Upload date:
  • Size: 90.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for cascade_ml-0.14.2.tar.gz
Algorithm Hash digest
SHA256 d8ec67c034dc4b32700e3ba08db1efc1a6f7fbd352b8926fc8195e55a4c1ba74
MD5 7123a914f2b57d2a9f9501e0b968014e
BLAKE2b-256 805e8241bb883a3c10864bd4d74d4dfa559379670df8470ad30019571306625e

See more details on using hashes here.

File details

Details for the file cascade_ml-0.14.2-py3-none-any.whl.

File metadata

  • Download URL: cascade_ml-0.14.2-py3-none-any.whl
  • Upload date:
  • Size: 160.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for cascade_ml-0.14.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8cb82a49fbfab77845b82cdc1902a5411b2016a5470647f47622f26b69f75343
MD5 750d38d06304970ca27bec438e928c57
BLAKE2b-256 776c3f9dd887a96480aac503127ab0090237485805808d82af30143a1dfb69fe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page