Skip to main content

A simple CLI tool to download DVC artifacts from large monorepos

Project description

Getting artifacts from monorepo projects without cloning the entire monorepo

dvcartifacts is a Python CLI tool which relies on gitPython and SSH to authenticate with a remote Git repository.

It uses sparse checkout without downloading anything outside of the specified project directory to speed up the cloning (cloning the project is required for getting the URL of the versioned artifact).

Important notes/limitations

Artifacts are assumed to be created in specific project subdirectories in a monorepo with the following structure, i.e. with each DVC project initialized in a separate subdirectory.

monorepo
    ├── project_1
    │   ├── .dvc
    │   │   ├── config
    │   │   └── .gitignore
    │   ├── .dvcignore
    │   ├── dvc.lock
    │   ├── dvc.yaml
    │   ├── mymodel.pkl    
    │   ├── my_script.py
    │   └── requirements.txt
    └── project_2
        ├── .dvc
        │   ├── config
        │   └── .gitignore
        └── .dvcignore

It is also assumed that artifacts are registered by GTO using a naming convention project_name:artifact_name, for example gto register project_1:mymodel. (This naming convention is automatically followed when artifacts are registered from the DVC Studio UI).

The tool relies on boto3 or google-cloud-storage to access the bucket (depending on the cloud storage used as a remote).

Right now, only a DVC remote which is at the root of a bucket is supported properly (i.e. no subdirectories)

Usage

usage: dvcartifacts [-h] [-r REV] repourl projectdir artifact_name

Download an artifact from the remote bucket

positional arguments:
  repourl            url of the GitHub repository associated with the artifact
  projectdir         project subdirectory in the monorepo where the artifact was created
  artifact_name      Name of the artifact to find

options:
  -h, --help         show this help message and exit
  -r REV, --rev REV  semantic version of the artifact (optional), latest version is used if this is not specified

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dvcartifacts-0.1.0.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

dvcartifacts-0.1.0-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file dvcartifacts-0.1.0.tar.gz.

File metadata

  • Download URL: dvcartifacts-0.1.0.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for dvcartifacts-0.1.0.tar.gz
Algorithm Hash digest
SHA256 aca0244415e1d9acb89c09857fd89bcb694a13781196d16dedbbc16de4587c62
MD5 7fbc932684003826a1ddd42a0e6b484f
BLAKE2b-256 4750b48a43e78105b884c4d255a7de06d2cd9d4052236e6335046a347602ba06

See more details on using hashes here.

File details

Details for the file dvcartifacts-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dvcartifacts-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9946724fe5d2ba721349796bba41d82fe5bf784556180f1ae5b114a3dfe742a6
MD5 5876dcc19454568fd9b3af089083169f
BLAKE2b-256 b91dcde818e43bff5c49c60029e5a1731c7451ef9d7b63cee816f1ab8ed13080

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page