Skip to main content

Command line wrapper for git and dvc

Project description

Fast Data Science aka fds

Discord Tests PyPI MIT license


fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

At a high level, fds is a command line wrapper around Git and DVC, meant to minimize the chances of human error, automate repetitive tasks, and provide a smoother landing for new users.

See the launch blog for more information about the motivation behind this project.

Installation

  • Install fds using PIP pip install fastds
  • Once installed successfully, you can start using fds
  • eg: fds init should trigger the init command
  • You can also use sdf instead of fds - it's identical, but might be more fun to type 🤓

Commands Supported

$ fds -h
usage: fds [-h] [-v] {init,status,add,commit,push,save} ...

One command for all your git and dvc needs

positional arguments:
  {init,status,add,commit,push,save}
                        command (refer commands section in documentation)
    init                initialize a git and dvc repository
    status              get status of your git and dvc repository
    add                 add files/folders to git and dvc repository
    commit              commits added changes to git and dvc repository
    clone               Clones from git repository and pulls from dvc remote
    push                push commits to remote git and dvc repository
    save                saves all project files to a new version and pushes
                        them to your remote

Examples

fds status = dvc status + git status

fds status lets us quickly check the full status of the repo - both DVC and git at the same time, to make sure we don't forget anything.

image

Here, we can see that we have a small, normal text file - .gitignore, plus a bigfile.txt and data folder which we would want to add to DVC and not to git. fds add makes that easy!

fds add = dvc add + git add wizard 🧙‍♂️

You're probably used to the convenience of using git add . to just track everything. Unfortunately, you have to be careful doing this when working with large files - one wrong move, and you might fry your hard drive by accidentally telling git to track a huge dataset!
We wanted to retain the convenience of just typing one command which means "just track all changes, I'll do a git commit in one second", which will be smart enough to avoid the pitfalls of large data files.
fds add does exactly that, while interactively asking the user how to handle files. You can add to DVC, or git, recursively step into large folders, skip or ignore files, etc.

image

Here's the file tree of the repo I used above, with file sizes included. Note how bigfile.txt and data/ were automatically added to DVC and not git:

image

fds commit = dvc commit + git commit

Finally, to close the loop of a real workflow, what happens when I change existing DVC tracked files? Without FDS, you'd have to remember to separately run dvc repro or dvc commit, then git add tracked_file.dvc, and only then git commit.
fds commit does all that for you - commits changes to DVC first, then adds the .dvc files with the updated hashes to git, then immediately commits these changes (plus any other staged changes) to a new git commit. Voila!

image

Contributing

We would love for you to try out FDS yourself, and to give us feedback. It would really help us to prioritize future features, so please vote on or create issues!
If you'd like to take a more active part, we have some good first issues that you can start with. We'll be happy to provide guidance on the best way to do so.

And of course, we're always happy to have you on the DAGsHub discord, where you can ask questions or give feedback on FDS: Discord


Made with ❤️   by dagshub-logo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastds-0.4.0.tar.gz (19.2 kB view details)

Uploaded Source

Built Distribution

fastds-0.4.0-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file fastds-0.4.0.tar.gz.

File metadata

  • Download URL: fastds-0.4.0.tar.gz
  • Upload date:
  • Size: 19.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.0 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for fastds-0.4.0.tar.gz
Algorithm Hash digest
SHA256 e45129254fe40afb958f9710ddab0c50c88cc743f81d1948811c362fc5ffc312
MD5 23b6a35b5047ca85e04c8c190d742a47
BLAKE2b-256 cdaa4dbc0258d9083ed6f7255c77db57aa032ef3ca164080d22ee930bea9155a

See more details on using hashes here.

File details

Details for the file fastds-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: fastds-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 24.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.0 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for fastds-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9fc7e34f73ff2efb101e59a0c35a9eb088c141cc3a3b935dd63f01c5cee4b7b7
MD5 adbd615ade90bc5cad05469b96e8eb61
BLAKE2b-256 818e4a6e81d8c4be594dd58813c546f55b5a108a6808a665ae3ecd8f8d79f6c9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page