Skip to main content

No project description provided

Project description

ML-Git

ML-Git is a tool which provides a Distributed Version Control system to enable efficient dataset management. Like its name emphasizes, it is inspired in git concepts and workflows, ML-Git enables the following operations:

  • Manage a repository of different datasets, labels and models.
  • Distribute these ML artifacts between members of a team or across organizations.
  • Apply the right data governance and security models to their artifacts.

How to install

Prerequisites:

With pip:

pip install ml-git

Source code:

Download ML-Git from repository and execute commands below:

  • Windows:

    cd ml-git/
    python3.7 setup.py install
    
  • Linux:

    cd ml-git/
    sudo python3.7 setup.py install
    

How to configure

1 - As ML-Git leverages git to manage ML entities metadata, it is necessary to configure user name and email address:

$ git config --global user.name "Your User"
$ git config --global user.email "your_email@example.com"

2 - Storage:

ML-Git needs a configured storage to store data from managed artifacts. Please take a look at the ML-Git architecture and internals documentation to better understand how ML-Git works internally with data.

3 - ML-Git project:

  • An ML-Git project is an initialized directory that will contain a configuration file to be used by ML-Git in managing entities. To configure it you can use the basic steps to configure the project described in first project documentation.

Usage

$ ml-git --help
Usage: ml-git [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit.

Commands:
  clone       Clone a ml-git repository ML_GIT_REPOSITORY_URL
  datasets    Management of datasets within this ml-git repository.
  labels      Management of labels sets within this ml-git repository.
  models      Management of models within this ml-git repository.
  repository  Management of this ml-git repository.

Basic commands

ml-git clone <repository-url>
$ mkdir my-project
$ cd my-project
$ ml-git clone https://github.com/user/ml_git_configuration_file_example.git

If you prefer not to create the directory:

$ ml-git clone https://github.com/user/ml_git_configuration_file_example.git --folder=my-project

If you prefer keep git tracking files in the project:

$ mkdir my-project
$ cd my-project
$ ml-git clone https://github.com/user/ml_git_configuration_file_example.git --track
ml-git <ml-entity> create This command will help you to start a new project, it creates your project artifact metadata:
$ ml-git datasets create --category=computer-vision --category=images --bucket-name=your_bucket --import=../import-path --mutability=strict dataset-ex 

Demonstration video:

asciicast

ml-git <ml-entity> status Show changes in project workspace:
$ ml-git datasets status dataset-ex

Demonstration video:

asciicast

ml-git <ml-entity> add Add new files to index:
$ ml-git datasets add dataset-ex

To increment version:

$ ml-git datasets add dataset-ex --bumpversion

Add an specific file:

$ ml-git datasets add dataset-ex data/file_name.ex

Demonstration video:

asciicast

ml-git <ml-entity> commit Consolidate added files in the index to repository:
$ ml-git datasets commit dataset-ex

Demonstration video:

asciicast

ml-git <ml-entity> push Upload metadata to remote repository and send [chunks](docs/mlgit_internals.md) to storage:
$ ml-git datasets push dataset-ex

Demonstration video:

asciicast

ml-git <ml-entity> checkout Change workspace and metadata to versioned ml-entity tag:
$ ml-git datasets checkout computer-vision__images__dataset-ex__1

Demonstration video:

asciicast

More about commands in documentation

How to contribute

Your contributions are always welcome!

  1. Clone repository and create a new branch
  2. Make changes and test
  3. Submit Pull Request with comprehensive description of changes

Another way to contribute with the community is creating an issue to track your ideas, doubts, enhancements, tasks, or bugs found. If an issue with the same topic already exists, discuss on the issue.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml-git-2.0.0.tar.gz (168.2 kB view hashes)

Uploaded Source

Built Distribution

ml_git-2.0.0-py3-none-any.whl (197.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page