Skip to main content

A simple CLI tool to download DVC artifacts from large monorepos

Project description

Getting artifacts from monorepo projects without cloning the entire monorepo

dvcartifacts is a Python CLI tool which relies on gitPython and SSH to authenticate with a remote Git repository.

It uses sparse checkout without downloading anything outside of the specified project directory to speed up the cloning (cloning the project is required for getting the URL of the versioned artifact).

Important notes/limitations

Artifacts are assumed to be created in specific project subdirectories in a monorepo with the following structure, i.e. with each DVC project initialized in a separate subdirectory.

monorepo
    ├── project_1
    │   ├── .dvc
    │   │   ├── config
    │   │   └── .gitignore
    │   ├── .dvcignore
    │   ├── dvc.lock
    │   ├── dvc.yaml
    │   ├── mymodel.pkl    
    │   ├── my_script.py
    │   └── requirements.txt
    └── project_2
        ├── .dvc
        │   ├── config
        │   └── .gitignore
        └── .dvcignore

It is also assumed that artifacts are registered by GTO using a naming convention project_name:artifact_name, for example gto register project_1:mymodel. (This naming convention is automatically followed when artifacts are registered from the DVC Studio UI).

The tool relies on boto3 or google-cloud-storage to access the bucket (depending on the cloud storage used as a remote).

Right now, only a DVC remote which is at the root of a bucket is supported properly (i.e. no subdirectories)

Usage

usage: dvcartifacts [-h] [-r REV] repourl projectdir artifact_name

Download an artifact from the remote bucket

positional arguments:
  repourl            url of the GitHub repository associated with the artifact
  projectdir         project subdirectory in the monorepo where the artifact was created
  artifact_name      Name of the artifact to find

options:
  -h, --help         show this help message and exit
  -r REV, --rev REV  GTO tag of the artifact (optional)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dvcartifacts-0.0.1.tar.gz (4.5 kB view hashes)

Uploaded Source

Built Distribution

dvcartifacts-0.0.1-py3-none-any.whl (5.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page