A simple CLI tool to download DVC artifacts from large monorepos
Project description
Getting artifacts from monorepo projects without cloning the entire monorepo
dvcartifacts
is a Python CLI tool which relies on gitPython
and SSH to authenticate with a remote Git repository.
It uses sparse checkout without downloading anything outside of the specified project directory to speed up the cloning (cloning the project is required for getting the URL of the versioned artifact).
Important notes/limitations
Artifacts are assumed to be created in specific project subdirectories in a monorepo with the following structure, i.e. with each DVC project initialized in a separate subdirectory.
monorepo
├── project_1
│ ├── .dvc
│ │ ├── config
│ │ └── .gitignore
│ ├── .dvcignore
│ ├── dvc.lock
│ ├── dvc.yaml
│ ├── mymodel.pkl
│ ├── my_script.py
│ └── requirements.txt
└── project_2
├── .dvc
│ ├── config
│ └── .gitignore
└── .dvcignore
It is also assumed that artifacts are registered by GTO using a naming convention project_name:artifact_name
, for example gto register project_1:mymodel
.
(This naming convention is automatically followed when artifacts are registered from the DVC Studio UI).
The tool relies on boto3
or google-cloud-storage
to access the bucket (depending on the cloud storage used as a remote).
Right now, only a DVC remote which is at the root of a bucket is supported properly (i.e. no subdirectories)
Usage
usage: dvcartifacts [-h] [-r REV] repourl projectdir artifact_name
Download an artifact from the remote bucket
positional arguments:
repourl url of the GitHub repository associated with the artifact
projectdir project subdirectory in the monorepo where the artifact was created
artifact_name Name of the artifact to find
options:
-h, --help show this help message and exit
-r REV, --rev REV semantic version of the artifact (optional), latest version is used if this is not specified
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dvcartifacts-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9946724fe5d2ba721349796bba41d82fe5bf784556180f1ae5b114a3dfe742a6 |
|
MD5 | 5876dcc19454568fd9b3af089083169f |
|
BLAKE2b-256 | b91dcde818e43bff5c49c60029e5a1731c7451ef9d7b63cee816f1ab8ed13080 |