Skip to main content

Version and deploy your models following GitOps principles

Project description

GTO

Great Tool Ops. Turn your Git Repo into Artifact Registry:

  • Index files in repo as artifacts to make them visible for others
  • Register new versions of artifacts marking significant changes to them
  • Promote versions to signal downstream systems to act
  • Act on new versions and promotions in CI
  • [WIP] Add enrichments that will add more information about the artifacts

To turn your repo into an artifact registry, you only need to pip install this package. Indexing, versioning and promoting are done with Git using files, commits, tags and branches. To use the artifact registry, you also need this package only.

The tool is created to be used both in CLI and in Python. The README will cover CLI part, but for all commands there are Python API counterparts in gto.api module.

Artifacts

To add new artifact or remove the existing ones, run gto add or gto rm:

$ gto add model simple-nn models/neural-network.pkl --virtual

$ gto add --help
Usage: gto add [OPTIONS] TYPE NAME PATH

  Register new artifact (add it to the Index)

Options:
  -v, --verbose
  -r, --repo TEXT  Repository to use  [default: .]
  --virtual        Virtual artifact that wasn't committed to Git
  --help           Show this message and exit.

You could also modify artifacts.yaml file directly.

There are two types of artifacts in GTO:

  1. Files/folders committed to the repo. When you register a new version or promote it to stage, Git guarantees that it's immutable. You can return to your repo a year later and be able to get 100% the same artifact by providing the same version.
  2. Virtual artifacts. This could be an external path, e.g. s3://mybucket/myfile or a local path if the file wasn't committed (as in case with DVC). In this case GTO can't pin the current physical state of the artifact and guarantee it's immutability. If s3://mybucket/myfile changes, you won't have any way neither retrieve, nor understand it's different now than it was before when you registered that artifact version.

In future versions, we will add enrichments (useful information other tools like DVC and MLEM can provide about the artifacts). This will allow treating files versioned with DVC and DVC PL outputs as usual artifacts instead virtual ones.

Versioning

After adding an artifact and committing modified artifacts.yaml, you can start creating new versions of it. You usually use those to mark significant changes to the artifact.

$ gto register simple-nn HEAD --version v1.0.0

$ gto register --help
Usage: gto register [OPTIONS] NAME REF

  Tag the object with a version (git tags)

Options:
  -v, --verbose
  -r, --repo TEXT        Repository to use  [default: .]
  --version, --ver TEXT  Version to promote
  -b, --bump TEXT        The exact part to use when bumping a version
  --help                 Show this message and exit.

Promoting

You could also promote a specific artifact version to stage. You can use that to signal downstream systems to act - for example, redeploy a ML model (your artifact) or update the config file (your artifact).

$ gto promote simple-nn prod

$ gto promote --help
Usage: gto promote [OPTIONS] NAME STAGE

  Assign label to specific artifact version

Options:
  -v, --verbose
  -r, --repo TEXT  Repository to use  [default: .]
  --version TEXT   If you provide --ref, this will be used to name new version
  --ref TEXT
  --help           Show this message and exit.

Using the registry

Let's see what are the commands that help us use the registry. Let's clone the example repo first:

$ git clone git@github.com:iterative/gto-example.git
$ cd gto-example

Show the actual state

This is the actual state of the registry: all artifacts, their latest versions, and what is promoted to stages right now.

$ gto show
╒══════════════╤═══════════╤══════════════════╤═══════════════╕
│ name         │ version   │ env/production   │ env/staging   │
╞══════════════╪═══════════╪══════════════════╪═══════════════╡
│ nn           │ v0.0.1    │ -                │ v0.0.1        │
│ rf           │ v1.0.1    │ v1.0.0           │ v1.0.1        │
│ features-dvc │ -         │ -                │ -             │
╘══════════════╧═══════════╧══════════════════╧═══════════════╛

Audit the registration and promotion

gto audit will print all registered versions of the artifact and all versions promoted to environments. This will help you to understand what was happening with the artifact.

$ gto audit --name rf

=== Registration audit trail ===
╒═════════════════════╤════════╤═══════════╤══════════════╤══════════╤═══════════════════╕
│ timestamp           │ name   │ version   │ deprecated   │ commit   │ author            │
╞═════════════════════╪════════╪═══════════╪══════════════╪══════════╪═══════════════════╡
│ 2022-03-18 12:10:15 │ rf     │ v1.0.0    │ -            │ 5eaf15a  │ Alexander Guschin │
│ 2022-03-18 12:11:21 │ rf     │ v1.0.1    │ -            │ 9fbb866  │ Alexander Guschin │
╘═════════════════════╧════════╧═══════════╧══════════════╧══════════╧═══════════════════╛

=== Promotion audit trail ===
╒═════════════════════╤════════╤════════════╤═══════════╤══════════════╤══════════╤═══════════════════╕
│ timestamp           │ name   │ label      │ version   │ deprecated   │ commit   │ author            │
╞═════════════════════╪════════╪════════════╪═══════════╪══════════════╪══════════╪═══════════════════╡
│ 2022-03-18 12:12:27 │ rf     │ production │ v1.0.0    │ -            │ 5eaf15a  │ Alexander Guschin │
│ 2022-03-18 12:13:30 │ rf     │ staging    │ v1.0.1    │ -            │ 9fbb866  │ Alexander Guschin │
│ 2022-03-18 12:14:33 │ rf     │ production │ v1.0.1    │ -            │ 9fbb866  │ Alexander Guschin │
│ 2022-03-18 12:15:37 │ rf     │ production │ v1.0.0    │ -            │ 5eaf15a  │ Alexander Guschin │
╘═════════════════════╧════════╧════════════╧═══════════╧══════════════╧══════════╧═══════════════════╛

See the history of an artifact

Another way to achieve the same is by using gto history command:

$ gto history --name rf
╒═════════════════════╤════════╤══════════════╤═══════════╤════════════╤══════════════╤══════════╤═══════════════════╕
│ timestamp           │ name   │ event        │ version   │ label      │ deprecated   │ commit   │ author            │
╞═════════════════════╪════════╪══════════════╪═══════════╪════════════╪══════════════╪══════════╪═══════════════════╡
│ 2022-03-18 12:10:12 │ rf     │ commit       │ -         │ -          │ -            │ 5eaf15a  │ Alexander Guschin │
│ 2022-03-18 12:10:15 │ rf     │ registration │ v1.0.0    │ -          │ -            │ 5eaf15a  │ Alexander Guschin │
│ 2022-03-18 12:11:18 │ rf     │ commit       │ -         │ -          │ -            │ 9fbb866  │ Alexander Guschin │
│ 2022-03-18 12:11:21 │ rf     │ registration │ v1.0.1    │ -          │ -            │ 9fbb866  │ Alexander Guschin │
│ 2022-03-18 12:12:27 │ rf     │ promotion    │ v1.0.0    │ production │ -            │ 5eaf15a  │ Alexander Guschin │
│ 2022-03-18 12:13:30 │ rf     │ promotion    │ v1.0.1    │ staging    │ -            │ 9fbb866  │ Alexander Guschin │
│ 2022-03-18 12:14:33 │ rf     │ promotion    │ v1.0.1    │ production │ -            │ 9fbb866  │ Alexander Guschin │
│ 2022-03-18 12:15:37 │ rf     │ promotion    │ v1.0.0    │ production │ -            │ 5eaf15a  │ Alexander Guschin │
╘═════════════════════╧════════╧══════════════╧═══════════╧════════════╧══════════════╧══════════╧═══════════════════╛

Act on new versions and promotions in CI

When CI is triggered, you can use the triggering git reference to determine the version of the artifact that was registered or promoted. In GH Actions you can use the GITHUB_REF environment variable to determine the version (check out GH Actions workflow in the example repo). You can also do that locally:

$ gto check-ref rf@v1.0.1
WARNING:root:Provided ref doesn't exist or it is not a tag that promotes to an environment
env: {}
version:
  rf:
    artifact: rf
    author: Alexander Guschin
    commit_hexsha: 9fbb8664a4a48575ee5d422e177174f20e460b94
    creation_date: '2022-03-18T12:11:21'
    deprecated_date: null
    name: v1.0.1

Getting right versions in downstream systems

To get the latest artifact version, it's path and git reference, run:

$ gto latest rf
v1.0.1
$ gto latest rf --path
models/random-forest.pkl
$ gto latest rf --ref
9fbb8664a4a48575ee5d422e177174f20e460b94

To get the version that is currently promoted to environment, run:

$ gto which rf production
v1.0.0
$ gto which rf production --path
models/random-forest.pkl
$ gto which rf production --ref
5eaf15a9fbb8664a4a48575ee5d422e177174f20e460b94

To download artifacts that are stored with DVC or outside of repo, e.g. in s3:// or in DVC cache, you'll need DVC or aws CLI.

Configuration

You can write configuration in .gto file in the root of your repo or use environment variables like this (note the GTO_ prefix):

GTO_VERSION_BASE=tag gto show

The default config written to .gto file will look like this (comments are there to help clarify the settings meaning and valid values):

index: artifacts.yaml
type_allowed: []  # list of allowed types
version_base: tag  # or commit
version_convention: numbers  # or semver
version_required_for_env: true  # if false, registering a version isn't required to promote to an environment
env_base: tag  # or branch
env_allowed: []  # list of allowed environments to promote to. Make sense for env_base=tag only.
env_branch_mapping: {}  # map of branch names to environment names. Makes sense for env_base=branch only.

If some list/dict should allow something but it's empty, that means that all values are allowed.

Some example configs (skipping default values)

type_allowed: [model, dataset]
version_convention: semver
env_allowed: [dev, test, prod]

In this setup you create versions and promote them with git tags (those are defaults). This would be a typical setup when you need both to register versions and promote them to envs, and your requirement is to create a version first before promoting the artifact from specific commit to the env (gto promote will automatically create a version for you in that case). It limits allowed types and envs and requires you to version your models with SemVer (v1.2.3 as opposed to v1 that is called Numbers in settings).

type_allowed: [model, dataset]
version_convention: semver
version_required_for_env: false
env_allowed: [dev, test, prod]

This setup has a single difference from the previous one. To promote a model to the environment, it doesn't require you to create a SemVer version. To indicate, which version was promoted, GTO will use a commit hexsha. That effectively means that registering and promoting are decoupled - you can do them independently. gto show, gto audit, gto history showcasing promotions will show SemVer when it's available, and commit hexsha when it's not.

version_base: commit
env_allowed: [dev, test, prod]

In this setup each commit counts as a version for artifact (it's only required for that artifact to exist in artifacts.yaml in those commits). You cannot create versions explicitly with gto register right now, because this requires to actually create PR/make a commit to the selected branch and it's not implemented yet. As for versions, you have a whitelist of allowed values. Because each commit is a version, you don't need to create a version before promoting. In fact it is similar to specifying version_required_for_env: false.

env_base: branch
env_branch_mapping:
    master: prod
    develop: dev

In this setup artifact version is assumed to be promoted in prod if it's committed in master and is the latest version in that branch. Because the default is version_base: tag, running gto promote will register new artifact version - and this at the same time will promote the artifact to the environment from env_branch_mapping. If you register a version in a branch that doesn't exist in env_branch_mapping, the promotion won't happen.

version_base: commit
env_base: branch

In this setup you cannot create versions explicitly with gto register, because each commit counts as a version for artifact (it's only required for that artifact to exist in artifacts.yaml in those commits) and you would need to actually create PR/make a commit to the selected branch. Likewise, you cannot promote to envs with gto promote because it's not implemented yet and exact way to do that is unclear - e.g. this would require to create a PR or direct commit that updates the artifact. I guess we should implement all of these in the future. For now this setup allows you to manage artifacts with gto add / gto rm and see the state of your repo gto show, gto audit, gto history. Finally, because env_branch_mapping is not specified, GTO will take into account all branches that have artifacts.yaml in them.

Trying it out

See example repo

Check out the example repo: https://github.com/iterative/gto-example read README in it and try it out

To try out the latest version

1. Clone this repository

git clone git@github.com:iterative/gto.git
cd gto

2. Create virtual environment named venv

python3 -m venv venv
source venv/bin/activate

Install python libraries

pip install --upgrade pip setuptools wheel ".[tests]"

3. Run

pytest --basetemp=pytest-cache

This will create pytest-cache folder with some fixtures that can serve as examples.

Notably, check out this folder:

cd pytest-cache/test_api0/
gto show -v

The code that generates this folder could be found in this fixture.

To continue experimenting, call

gto --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gto-0.0.9.tar.gz (28.8 kB view details)

Uploaded Source

Built Distribution

gto-0.0.9-py3-none-any.whl (28.9 kB view details)

Uploaded Python 3

File details

Details for the file gto-0.0.9.tar.gz.

File metadata

  • Download URL: gto-0.0.9.tar.gz
  • Upload date:
  • Size: 28.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.2

File hashes

Hashes for gto-0.0.9.tar.gz
Algorithm Hash digest
SHA256 1387997d71ec1636e87954c21e872b5f41993b0271280947675b01d8ce628902
MD5 cb6a2b1e507ff338ecab4089d9f20efd
BLAKE2b-256 c1fcdea358d04523c3a07817a5fbfc8a4872a80df152c6560d0d3b227ba3d4aa

See more details on using hashes here.

File details

Details for the file gto-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: gto-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 28.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.2

File hashes

Hashes for gto-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 01e18bde72a30db98bb86a8d7559373e1b15c67e8760efcc4adeb01d67dc3a55
MD5 3092b443bf2000fe1f1588668771bad8
BLAKE2b-256 d5b7853d2bb12e858c8d95e1bef2ab51c809e8a555bb3528300e64246cfd4277

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page