Skip to main content

A sample dvc project

Project description

DVC DOWNLOAD

Use-Case

​ Let suppose we have a DVC tracked central repository which contains different kinds of Datasets and few other saved items on Amazon S3. Now many developers work on the same dataset or they want to use certain stuff for their project from the centralized repo.

For this particular task we have dvc get command which can download a file or directory tracked by DVC or by Git into the current working directory.

# Template dvc get code which can be used to pull one file.
dvc get {git_url_for_centralized_repo} {file_path_to_pull} -o {output_path_in_current_working_directory}

This library helps to automate the task of pulling multiple files from the centralized repo.

Working

There is only one python script which contains all the logic download.py.

The user has to provide data to be pulled from the centralized repo in a CSV file stored in current project directory. Using that CSV file the python generates an Expect script which is then executed.

The CSV file should contain four columns

file,output,git_url,password

file :- The file with full path in the centralized repo. output :- where to store the file in current directory. git_url :- The Git url of the DVC tracked file password :- For private repo password is needed

sample.csv file

file output git_url password
processed_data/Hindi/Hindi_Processed_15K_Sentiment.csv ./data/processed_data https://aman5319@bitbucket.org/aman5319/datasets xyz
saved_outputs/Hindi/Hindi_SentencePiece_Tokenizer.model . https://aman5319@bitbucket.org/aman5319/datasets xyz

Dependencies

Install Expect in your system.

For Ubuntu sudo apt-get install expect

Installation

pip install dvcdownload

Usage

# Help
$ dvcdownload --help
Usage: dvcdownload [OPTIONS] COMMAND [ARGS]...

Options:
  -f, --filename TEXT  The CSV file path which should have columns as:- file,
                       output, git_url[Optional], password[Optional]
  --help               Show this message and exit.

Commands:
  different  Use if the csv file contains all the url and password for each...
  same       Use if same git url and password to be used for all objects in...

By default the CSV filename is file_info.csv if used same file name the --filename option is not required.

There are two sub commands

  1. same

    # Sub command same help
    $ dvcdownload same --help
    Usage: dvcdownload same [OPTIONS]
    
      Use if same git url and password to be used for all objects in the csv
      file.
    
    Options:
      -u, --git_url TEXT   Enter the git url  [required]
      -p, --password TEXT  Enter the password  [required]
      --help               Show this message and exit.
    

    If all the data to be pulled from same repo then the file name and output will be taken from the csv file and git_url , password has to be provided explicitly by the user.

    # same (By default filename is file_info.csv) 
    $ dvcdownload same --git_url=https://aman5319@bitbucket.org/aman5319/datasets --password=xyz
    # same with another filename option
    $ dvcdownload --filename=mycsv.csv same -- git_url=https://aman5319@bitbucket.org/aman5319/datasets --password=xyz
    
  2. different

    # sub command different help
    $ dvcdownload different --help
    Usage: dvcdownload different [OPTIONS]
    
      Use if the csv file contains all the url and password for each objects.
    
    Options:
      --help  Show this message and exit.
    

    Here if you want the script to take git_url and password from the CSV file then use this. Here different git urls and their password can be used to fetch files from different repos into the current directory.

    # same (By default filename is file_info.csv) 
    $ dvcdownload different
    # same with another filename option
    $ dvcdownload --filename=mycsv.csv different
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dvcdownload-0.2b1.tar.gz (18.5 kB view details)

Uploaded Source

File details

Details for the file dvcdownload-0.2b1.tar.gz.

File metadata

  • Download URL: dvcdownload-0.2b1.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.9

File hashes

Hashes for dvcdownload-0.2b1.tar.gz
Algorithm Hash digest
SHA256 10c4fe78bb24e2b84766986558702d7a2e95c15e3da05e9cdc8fbebc1f58238b
MD5 f07667a431efc1ad99106add0fac9c22
BLAKE2b-256 fb61c2cda496a8a00e87e64ae14005997382ea58ff1a0ea3be62d2f1b68da9c0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page