starwhale

An MLOps Platform for Model Evaluation

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

What is Starwhale

Starwhale is an MLOps platform. It provides Instance, Project, Runtime, Model, and Dataset.

Instance: Each installation of Starwhale is called an instance.
- 👻 Standalone Instance: The simplest form that requires only the Starwhale Client(swcli). swcli is written by pure python3.
- 🎍 On-Premises Instance: Cloud form, we call it private cloud instance. Kubernetes and BareMetal both meet the basic environmental requirements.
- ☁️ Cloud Hosted Instance: Cloud form, we call it public cloud instance. Starwhale team maintains the web service.
Starwhale tries to keep concepts consistent across different types of instances. In this way, people can easily exchange data and migrate between them.
Project: The basic unit for organizing different resources.
ML Basic Elements: The Machine Learning/Deep Learning running environments or artifacts. Starwhale empowers the ML/DL essential elements with packaging, versioning, reproducibility, and shareability.
- 🐌 Runtime: Software dependencies description to "run" a model, which includes python libraries, native libraries, native binaries, etc.
- 🐇 Model: The standard model format used in model delivery.
- 🐫 Dataset: A unified description of how the data and labels are stored and organized. Starwhale datasets can be loaded efficiently.
Running Fundamentals: Starwhale uses Job, Step, and Task to execute ML/DL actions like model training， evaluation, and serving. Starwhale's Controller-Agents structure scales out easily.
- 🥕 Job: A set of programs to do specific work. Each job consists of one or more steps.
- 🌵 Step: Represents distinct stages of the work. Each step consists of one or more tasks.
- 🥑 Task: Operation entity. Tasks are in some specific steps.
Scenarios: Starwhale provides the best practice and out-of-the-box for different ML/DL scenarios.
- 🚝 Model Training(TBD): Use Starwhale Python SDK to record experiment meta, metric, log, and artifact.
- 🛥️ Model Evaluation: PipelineHandler and some report decorators can give you complete, helpful, and user-friendly evaluation reports with only a few lines of codes.
- 🛫 Model Serving(TBD): Starwhale Model can be deployed as a web service or stream service in production with deployment capability, observability, and scalability. Data scientists do not need to write ML/DL irrelevant codes.

MNIST Quick Tour for the standalone instance

Use Notebook

You can try it in Google Colab
Or run example/mnist/notebook.ipynb locally using vscode or jupyterlab

Use your own python env

Core Job Workflow

🍰 STEP1: Installing Starwhale
```
python3 -m pip install starwhale
```

🍵 STEP2: Downloading the MNIST example

git clone https://github.com/star-whale/starwhale.git

If git-lfs has not been previously installed in the local environment(the command is git lfs install), you need to download the trained model file.

wget https://media.githubusercontent.com/media/star-whale/starwhale/main/example/mnist/models/mnist_cnn.pt -O example/mnist/models/mnist_cnn.pt

☕ STEP3: Building a runtime

cd example/runtime/pytorch
swcli runtime build .
swcli runtime list
swcli runtime info pytorch/version/latest

🍞 STEP4: Building a model

Enter example/mnist directory:

cd ../../mnist

Write some code with Starwhale Python SDK. Complete code is here.

import typing as t
import torch
from starwhale import Image, PipelineHandler, PPLResultIterator, multi_classification

class MNISTInference(PipelineHandler):
     def __init__(self) -> None:
         super().__init__()
         self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
         self.model = self._load_model(self.device)

     def ppl(self, img: Image, **kw: t.Any) -> t.Tuple[t.List[int], t.List[float]]:
         data_tensor = self._pre(img)
         output = self.model(data_tensor)
         return self._post(output)

     @multi_classification(
         confusion_matrix_normalize="all",
         show_hamming_loss=True,
         show_cohen_kappa_score=True,
         show_roc_auc=True,
         all_labels=[i for i in range(0, 10)],
     )
     def cmp(
         self, ppl_result: PPLResultIterator
     ) -> t.Tuple[t.List[int], t.List[int], t.List[t.List[float]]]:
         result, label, pr = [], [], []
         for _data in ppl_result:
             label.append(_data["annotations"]["label"])
             result.extend(_data["result"][0])
             pr.extend(_data["result"][1])
         return label, result, pr

    def _pre(self, input:bytes):
        """write some mnist preprocessing code"""

    def _post(self, input:bytes):
        """write some mnist post-processing code"""

    def _load_model():
        """load your pre trained model"""

Define model.yaml.

name: mnist
model:
  - models/mnist_cnn.pt
config:
  - config/hyperparam.json
run:
  handler: mnist.evaluator:MNISTInference

Run one command to build the model.

 swcli model build .
 swcli model info mnist/version/latest

🍺 STEP5: Building a dataset

Download MNIST RAW data files.

 mkdir -p data && cd data
 wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
 wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
 gzip -d *.gz
 cd ..
 ls -lah data/*

Write some code with Starwhale Python SDK. Full code is here.

 import struct
 import typing as t
 from pathlib import Path
 from starwhale import BuildExecutor

 class DatasetProcessExecutor(SWDSBinBuildExecutor):
     def iter_item(self) -> t.Generator[t.Tuple[t.Any, t.Any], None, None]:
         root_dir = Path(__file__).parent.parent / "data"

         with (root_dir / "t10k-images-idx3-ubyte").open("rb") as data_file, (
             root_dir / "t10k-labels-idx1-ubyte"
         ).open("rb") as label_file:
             _, data_number, height, width = struct.unpack(">IIII", data_file.read(16))
             _, label_number = struct.unpack(">II", label_file.read(8))
             print(
                 f">data({data_file.name}) split data:{data_number}, label:{label_number} group"
             )
             image_size = height * width

             for i in range(0, min(data_number, label_number)):
                 _data = data_file.read(image_size)
                 _label = struct.unpack(">B", label_file.read(1))[0]
                 yield GrayscaleImage(
                     _data,
                     display_name=f"{i}",
                     shape=(height, width, 1),
                 ), {"label": _label}

Define dataset.yaml.

 name: mnist
 handler: mnist.dataset:DatasetProcessExecutor
 attr:
   alignment_size: 1k
   volume_size: 4M
   data_mime_type: "x/grayscale"

Run one command to build the dataset.

 swcli dataset build .
 swcli dataset info mnist/version/latest

Starwhale also supports build dataset with pure python sdk. You can try it in Google Colab.

🍖 STEP6: Running an evaluation job

 swcli -vvv eval run --model mnist/version/latest --dataset mnist/version/latest --runtime pytorch/version/latest
 swcli eval list
 swcli eval info ${version}

👏 Now, you have completed the fundamental steps for Starwhale standalone.

Let's go ahead and finish the tutorial on the on-premises instance.

MNIST Quick Tour for on-premises instance

Create Job Workflow

🍰 STEP1: Install minikube and helm
- Minikube 1.25+
- Helm 3.2.0+
🍵 STEP2: Start minikube
```
minikube start
```
For users in the mainland of China, please add these startup parameters：--image-mirror-country=cn --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers. If there is no kubectl bin in your machine, you may use minikube kubectl or alias kubectl="minikube kubectl --" alias command.

🍵 STEP3: Installing Starwhale

helm repo add starwhale https://star-whale.github.io/charts
helm repo update
helm install --devel my-starwhale starwhale/starwhale -n starwhale --create-namespace --set minikube.enabled=true

After the installation is successful, the following prompt message appears:

NAME: my-starwhale
LAST DEPLOYED: Thu Jun 23 14:48:02 2022
NAMESPACE: starwhale
STATUS: deployed
REVISION: 1
NOTES:
******************************************
Chart Name: starwhale
Chart Version: 0.3.0
App Version: 0.3.0
...

Port Forward Visit:
- starwhale controller:
    - run: kubectl port-forward --namespace starwhale svc/my-starwhale-controller 8082:8082
    - visit: http://localhost:8082
- minio admin:
    - run: kubectl port-forward --namespace starwhale svc/my-starwhale-minio 9001:9001
    - visit: http://localhost:9001
- mysql:
    - run: kubectl port-forward --namespace starwhale svc/my-starwhale-mysql 3306:3306
    - visit: mysql -h 127.0.0.1 -P 3306 -ustarwhale -pstarwhale

******************************************
Login Info:
- starwhale: u:starwhale, p:abcd1234
- minio admin: u:minioadmin, p:minioadmin

*_* Enjoy using Starwhale. *_*

Then keep checking the minikube service status until all pods are running.

kubectl get pods -n starwhale

NAME	READY	STATUS	AGE
my-starwhale-controller-7d864558bc-vxvb8	1/1	Running	1m
my-starwhale-minio-7d45db75f6-7wq9b	1/1	Running	2m
my-starwhale-mysql-0	1/1	Running	2m

Make the Starwhale controller accessible locally with the following command:

kubectl port-forward --namespace starwhale svc/my-starwhale-controller 8082:8082

☕ STEP4: Upload the artifacts to the cloud instance
pre-prepared artifacts Before starting this tutorial, the following three artifacts should already exist on your machine：
- a starwhale model named mnist
- a starwhale dataset named mnist
- a starwhale runtime named pytorch
The above three artifacts are what we built on our machine using starwhale.
1. Use swcli to operate the remote server First, log in to the server:
```
swcli instance login --username starwhale --password abcd1234 --alias dev http://localhost:8082
```
2. Start copying the model, dataset, and runtime that we constructed earlier:
```
swcli model copy mnist/version/latest dev/project/starwhale
swcli dataset copy mnist/version/latest dev/project/starwhale
swcli runtime copy pytorch/version/latest dev/project/starwhale
```
🍞 STEP5: Use the web UI to run an evaluation
1. Log in Starwhale instance: let's use the username(starwhale) and password(abcd1234) to open the server web UI(http://localhost:8082/).
2. Then, we will see the project named 'project_for_mnist' that we created earlier with swcli. Click the project name, you will see the model, runtime, and dataset uploaded in the previous step.
  
  Show the uploaded artifacts screenshots
3. Create and view an evaluation job
  
  Show create job screenshot

Congratulations! You have completed the evaluation process for a model.

Documentation, Community, and Support

Visit Starwhale HomePage.
More information in the official documentation.
For general questions and support, join the Slack.
For bug reports and feature requests, please use Github Issue.
To get community updates, follow @starwhaleai on Twitter.
For Starwhale artifacts, please visit:
- Python Package on Pypi.
- Helm Charts on Artifacthub.
- Docker Images on Docker Hub and ghcr.io.
Additionally, you can always find us at developer@starwhale.ai.

Contributing

🌼👏PRs are always welcomed 👍🍺. See Contribution to Starwhale for more details.

License

Starwhale is licensed under the Apache License 2.0.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.6.13

Feb 4, 2024

0.6.12

Jan 31, 2024

0.6.11

Jan 19, 2024

0.6.10

Jan 5, 2024

0.6.9

Dec 24, 2023

0.6.8

Dec 7, 2023

0.6.7

Dec 4, 2023

0.6.6

Nov 30, 2023

0.6.5

Nov 17, 2023

0.6.4

Nov 2, 2023

0.6.3

Oct 31, 2023

0.6.2

Oct 27, 2023

0.6.1

Oct 16, 2023

0.6.0

Oct 16, 2023

0.5.12

Sep 5, 2023

0.5.11

Aug 16, 2023

0.5.10

Aug 5, 2023

0.5.9

Jul 31, 2023

0.5.8

Jul 26, 2023

0.5.7

Jul 24, 2023

0.5.6

Jul 19, 2023

0.5.5

Jul 14, 2023

0.5.4

Jul 6, 2023

0.5.3

Jul 4, 2023

0.5.2

Jul 3, 2023

0.5.1

Jun 27, 2023

0.5.0

Jun 25, 2023

0.4.9

Jun 15, 2023

0.4.8

Jun 13, 2023

0.4.7

Jun 6, 2023

0.4.6

May 27, 2023

0.4.5

May 16, 2023

0.4.4

May 4, 2023

0.4.3

Apr 23, 2023

0.4.2

Apr 15, 2023

0.4.1

Mar 14, 2023

0.4.0

Feb 16, 2023

This version

0.3.6

Jan 30, 2023

0.3.5

Jan 3, 2023

0.3.4

Dec 19, 2022

0.3.3

Dec 5, 2022

0.3.2

Nov 21, 2022

0.3.1

Nov 7, 2022

0.3.0

Oct 18, 2022

0.2.2

Aug 8, 2022

0.2.1

Jul 4, 2022

0.2.0

Jun 30, 2022

0.1.0.dev0 pre-release

Apr 7, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

starwhale-0.3.6.tar.gz (171.9 kB view hashes)

Uploaded Jan 30, 2023 Source

Built Distribution

starwhale-0.3.6-py3-none-any.whl (210.2 kB view hashes)

Uploaded Jan 30, 2023 Python 3

Hashes for starwhale-0.3.6.tar.gz

Hashes for starwhale-0.3.6.tar.gz
Algorithm	Hash digest
SHA256	`601a01a883bbb422aace96bf542e74bc4f42377368075044afef965de351edad`
MD5	`b44be3d0ec822c42416857c9a306f457`
BLAKE2b-256	`58870407d2d0b36cead4f728240553c9ebe4aaef4b8e97a3f4b9a982026d1ff4`

Hashes for starwhale-0.3.6-py3-none-any.whl

Hashes for starwhale-0.3.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`298bc3cafc738439be3d14ecf74c86c50cdeae731595118c90c24572947c4c20`
MD5	`4cfec53a879dfdae4ed62fbe7e9cb370`
BLAKE2b-256	`205b578aa3904b9fd612219b9ec8f7479037daaa97fd1228150277faff897199`