Skip to main content

No project description provided

Project description

docs/source/images/bluprint_logo.png

Bluprint

Bluprint is a command line utility for streamlined exploratory data science projects. Bluprint projects allow Jupyter and RMarkdown notebooks seamless access to configuration, data and shared code, using best coding practices, in this type of project structure:

my_project
├── conf
│   └── data.yaml
├── data
│   ├── emailed
│   │   └── messy.xlsx
│   └── user_processed.csv
├── notebooks
│   └── process.ipynb
└── my_project
    └── shared_code.py

Features

  • Configuration, data and shared code (Python/R scripts) separated from notebooks.

  • Mixing of any or all Python/R scripts and Jupyter/RMarkdown notebooks.

  • Consistent access to configuration and data, e.g. data.emailed.messy (Python) automatically resolves to /path/to/my_project/data/emailed/messy.xlsx.

  • Consistent access to project modules, e.g. from my_project import shared_code in any notebook in any sub-directory.

  • Share project easily; just copy the project directory and run pdm install.

  • Reproducibility: Python and R dependencies are version locked.

  • Works with tools for notebook linting, testing, CI/CD and workflows.

  • Bluprint projects are Python packages; use pip install /path/to/my_project to reuse shared code across projects.

Usage

bluprint create my_project creates a project skeleton similar to the example shown above. Once created, we can add data files and store all file paths relative to the my_project/data directory, in the data.yaml:

emailed:
    messy: 'emailed/messy.xlsx'
user:
    processed: 'user_processed.csv'

Then retrieve the automatically parsed full paths, for example in process.ipynb above:

# bluprint_conf is a helper package for loading configs
from bluprint_conf import load_data_yaml
from my_project.shared_code import process_data
import pandas as pd

data = load_data_yaml() # default arg: conf/data.yaml
print(data)
#> {
#>   'emailed': {
#>     'messy': '/path/to/my_project/data/emailed/messy.xlsx'
#>   },
#>   'user': {
#>     'processed': '/path/to/my_project/data/user_processed.csv'
#>   }
#> }

messy_df = pd.read_xlsx(data.emailed.messy)

processed_df = process_data(messy_df)

processed_df.to_csv(data.user.processed)

For a working demonstration of a shareable project https://github.com/igor-sb/bluprint-demo/.

Documentation

Full documentation available at: https://igor-sb.github.io/bluprint/.

Installation

Install pipx and PDM. Then run:

pipx install bluprint

References

Bluprint integrates:

Bluprint is heavily inspired by these resources:

License

Bluprint is released under MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bluprint-0.1.3.tar.gz (19.1 kB view hashes)

Uploaded Source

Built Distribution

bluprint-0.1.3-py3-none-any.whl (22.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page