Skip to main content

A tool to create project structures for ml/data

Project description

mlcookiecutter

Overview

mlcookiecutter is a command-line tool that helps you quickly set up a structured project directory for data science and machine learning projects. It creates a predefined folder structure, initializes necessary files, and provides a template for your project, making it easier to start new projects with best practices in mind.

Features

  • Create a well-organized project structure
  • Generate essential files such as README.md, LICENSE, and requirements.txt
  • Supports various license types
  • Allows specifying CODEOWNERS for collaborative projects

Installation

You can install the package directly from PyPI using pip:

pip install mlcookiecutter

Or clone this repository and install it locally:

git clone https://github.com/sarag5/mlcookiecutter.git
cd mlcookiecutter
pip install -e .

Usage To create a new project structure, use the command line interface:

mlcookiecutter --project_name <your_project_name> --license_type <license_type> --codeowners <comma_separated_owners>

Example

mlcookiecutter --project_name my_data_project --license_type mit --codeowners user1@example.com,user2@example.com

This command will create a new directory called my_data_project with the following structure:

my_data_project/
├── data/
│   ├── raw/
│   └── processed/
├── src/
│   ├── data_engineering/
│   ├── feature_engineering/
│   ├── model/
│   └── utils/
├── notebooks/
├── tests/
│   ├── data_engineering/
│   └── model/
├── models/
├── scripts/
├── deployment/
├── .github/
│   └── workflows/
├── .gitignore
├── LICENSE
├── README.md
├── requirements.txt
├── CODEOWNERS
└── tasks.py

Available Tasks

This project uses Invoke to manage various tasks. Here are the available tasks:

run_locally: Run the project locally.
run_tests: Run the test suite.
run_lint: Run the linter (flake8).
run_build: Run the build process.
check_format: Check the code formatting.
correct_format: Correct the code formatting.

To see all available tasks, run:

invoke --list

GitHub Workflows

This project includes several GitHub workflows to ensure code quality and proper ML practices:


Test and Lint

  • Triggered on every push and pull request
  • Runs the test suite using pytest
  • Checks code style using flake8

ML Model Validation

  • Triggered when changes are made to the model code or data
  • Runs model training and validation scripts

Data Quality Check

  • Triggered when changes are made to the data files
  • Runs data quality checks to ensure data integrity

These workflows help maintain code quality, ensure proper testing, and validate both the model and data throughout the development process.

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Fork the repository.

  1. Create your feature branch: git checkout -b feature/YourFeature
  2. Commit your changes: git commit -m 'Add some feature'
  3. Push to the branch: git push origin feature/YourFeature
  4. Open a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.


Contributors

See the CODEOWNERS file for the list of contributors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlcookiecutter-1.0.0.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

mlcookiecutter-1.0.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file mlcookiecutter-1.0.0.tar.gz.

File metadata

  • Download URL: mlcookiecutter-1.0.0.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for mlcookiecutter-1.0.0.tar.gz
Algorithm Hash digest
SHA256 cd4e99e2c479261c1987d34c9b9c383d3ddee46dcb2585bf67208e8dd941e6a6
MD5 508fb003483456d1d70031cc05039e83
BLAKE2b-256 baa850fe61b48938dfb7b7eb35b1d389649d865c54f7b8eeb3f0d72cc9528f7c

See more details on using hashes here.

File details

Details for the file mlcookiecutter-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mlcookiecutter-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 60bf67f555d9356d9d098030d9ba6eda403c345f69df017631fc6b3cf1234eb9
MD5 99b900e15d1e2087b681b7b3ffcf7a37
BLAKE2b-256 8a605553270489fd86c6141af881cef158f08556cf211a86232f8b8f6a3dec8e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page