No project description provided
Project description
Bluprint
Bluprint is a command line utility for streamlined exploratory data science projects. Bluprint projects allow Jupyter and RMarkdown notebooks seamless access to configuration, data and shared code, using best coding practices, in this type of project structure:
my_project ├── conf │ └── data.yaml ├── data │ ├── emailed │ │ └── messy.xlsx │ └── user_processed.csv ├── notebooks │ └── process.ipynb └── my_project └── shared_code.py
Features
Configuration, data and shared code (Python/R scripts) separated from notebooks.
Mixing of any or all Python/R scripts and Jupyter/RMarkdown notebooks.
Consistent access to configuration and data, e.g. data.emailed.messy (Python) automatically resolves to /path/to/my_project/data/emailed/messy.xlsx.
Consistent access to project modules, e.g. from my_project import shared_code in any notebook in any sub-directory.
Share project easily; just copy the project directory and run pdm install.
Reproducibility: Python and R dependencies are version locked.
Works with tools for notebook linting, testing, CI/CD and workflows.
Bluprint projects are Python packages; use pip install /path/to/my_project to reuse shared code across projects.
Usage
bluprint create my_project creates a project skeleton similar to the example shown above. Once created, we can add data files and store all file paths relative to the my_project/data directory, in the data.yaml:
emailed:
messy: 'emailed/messy.xlsx'
user:
processed: 'user_processed.csv'
Then retrieve the automatically parsed full paths, for example in process.ipynb above:
# bluprint_conf is a helper package for loading configs
from bluprint_conf import load_data_yaml
from my_project.shared_code import process_data
import pandas as pd
data = load_data_yaml() # default arg: conf/data.yaml
print(data)
#> {
#> 'emailed': {
#> 'messy': '/path/to/my_project/data/emailed/messy.xlsx'
#> },
#> 'user': {
#> 'processed': '/path/to/my_project/data/user_processed.csv'
#> }
#> }
messy_df = pd.read_xlsx(data.emailed.messy)
processed_df = process_data(messy_df)
processed_df.to_csv(data.user.processed)
For a working demonstration of a shareable project https://github.com/igor-sb/bluprint-demo/.
Documentation
Full documentation available at: https://igor-sb.github.io/bluprint/.
Installation
Install pipx and PDM. Then run:
pipx install bluprint
References
Bluprint integrates:
Python’s native import system
R package renv
R package here
R package reticulate
Bluprint is heavily inspired by these resources:
Author’s own frustration of dealing with malfunctioning notebooks for over a decade.
Vincent D. Warmerdam: Untitled12.ipynb | PyData Eindhoven 2019
License
Bluprint is released under MIT license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.