A package to convert Jupyter Notebooks to Word Documents (and other quarto supported formats) and upload them to Google Drive, with support for multiple user roles, using Quarto.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

JupDoc: 🚀 Automated data science documentation 🚀

Documentation - that everybody wants it, but nobody gets it 😣! Is this all familiar to you? No longer.

JupDoc is a Python package that simplifies publishing documents from Jupyter Notebooks into multiple docx files (or other Quarto-supported formats) based on cell tags!

It embraces write once - publish for any <> where <> = {person, role, time, format, location, style} paradigm!

About
Why JupDoc
What is the solution
Core tenets
Features
Benefits
A new mindset
Installation
Usage
- Command Line Interface
- Python API
To Do
License
Acknowledgements
Contributing

-This package is still under development, and the documentation needs to be completed. There may be bugs, and the API may change._

About

JupDoc is a light wrapper written in Python that simplifies publishing Jupyter notebooks into multiple docx files or other formats while applying view-based filters based on cell tags. It is based on Quarto.

with JupDoc

Tag a notebook cell for a particular stakeholder
Render the notebook into any format, filtered by the tag, for that stakeholder
From a single notebook, create as many views/docs as needed
Convert ipynb notebooks to docx, pdf, HTML tex, MD
Upload the generated files to Google Drive (need service account)
Maintain a single source of truth
Automate document publishing
No broken docs and constant remainders

Why JupDoc

Different stakeholders (like Business Executives, Product Managers, and ML Scientists, among others) need information to act. Often, the format, content, and emphasis will be different. Further, they want documents that are:

Accurate (and up to date)
Available
Accessible
Reproducible
Audutiable
Versioned
Serves their needs (w.r.t content, format, style, accessibility, shareability, etc.)

But, an ML Developer can only deliver on some fronts. Conventionally,

Development and Documentation are not part of the same process. The toolset and mindset are different, even separated in time & space.
Even a simple edit or change request requires a copy-and-paste from somewhere. If data or ask or both change, one must redo the documentation repeatedly. This is neither repeatable nor reproducible and also not sustainable.

As a result, a rigorous process oversight is needed for compliance. For example, a reporting manager may periodically check if the documentation is maintained and up to date as a part of the review processing. But this is not sustainable. We know it all too well.

What is the solution

Surprisingly simple. Just tag a notebook cell - who is it for?
And take benefit of modern document publishing tools and workflows like Quarto, GitHub, GitHub actions

Core tenets

One content - many views
Data + Code + Content > should drive the documentation (format, style, purpose)
Each stakeholder’s documentation need is just a view or a content rendering problem
Publishing documentation = Publishing code
Use the same tools and mental models both for code and documents
Single source of truth for any derived document
Fix in only one place and only once.
Physical and mental distance between Documentation and Code should be (close to) ZERO
Set up once and automate subsequently
Automate the publishing process
No human oversight should be necessary for process compliance
Commit code + documentation content > rendering must be automated

Result is JupDoc

Features

Define views using cell tags.
Convert Jupyter Notebooks to docx, PDF, HTML, and more.
Generate separate documents for each view.

Benefits

Data-driven documents enable

reproducibility, auditing, versioning, accuracy,
When code blocks read data (e.g., ground truth vs. predictions), documents can be up to date also.

A new mindset

Writing documents is the same as writing code in the same place, space, and time.
The stakeholder needs can be arranged in a hierarchy while authoring content via content inheritance. Executive Summary < Model Card < Solution Card < Full Report!
To drive reports and visualization (of evaluation metrics), read data, and run code - so that whenever models/data are updated, metrics are also updated automatically!

Installation

You can install JupDoc using pip:

pip install jupdoc

JupDoc is based on Quarto to convert ipynb files to other formats. The instructions to install Quarto can be found here.

Usage

We support two ways to convert notebooks to docs. The first one is using the command line interface. The second one is using the Python API.

Note:

The conversion of .ipynb is based on Quarto, and custom rending can be done by adding yaml config specific to notebooks as raw cells.
All cells in the notebook should have tags (including markdown cells), and the tags should be a part of the config used to export.
Quarto cheat sheet can be refered from here. Details can be provided in the raw cell for customizations on reports.

Command Line Interface

The command line interface can be used as follows:

jupdoc --config <config_file>

In case of the absence of the config file, the configs can be passed as command line arguments:

jupdoc --filename <filename> --tags <tags> --prefix <prefix> --output <output> --format <format> --upload <upload> --folder_url <folder_url> --creds_path <creds_path> --reference_docx <reference_docx>

The arguments are as follows:

filename: The path to the notebook file.
tags: The tags to be used for access control. Multiple tags can be passed as a comma-separated string.
prefix: The prefix for the output files.
output: The path to the output directory.
format: File format to be exported to.
upload: Upload the files to Google Drive.
folder_url: The URL of the Google Drive folder to upload the files to.
creds_path: The path to the Google Drive credentials file. (For Service Account)
reference_docx: The path to the reference docx file. (Optional)

Python API

The Python API can be used as follows:

from jupdoc import convert
args = {
    "filename": "notebook.ipynb",
    "tags": ["tag1", "tag2"],
    "prefix": "prefix",
    "output": "output",
    "format": "docx",
    "upload": True,
    "folder_url": "https://drive.google.com/drive/folders/1Qlw7SxdPr4Ag1mKl4-cTrjgJPgZyzzYb?usp=drive_link",
    "creds_path": "creds.json"
    "reference_docx": "reference.docx"

}
convert(**args)

To Do

Improve documentation. WUP
Add support for multiple cell tags.
GitHub Actions to generate reports on a push based on JupDoc configs.

License

This project is licensed under the terms of the MIT license.

Acknowledgements

This project is based on Quarto.

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given. You can contribute in many ways:

Report bugs.
Fix bugs and submit pull requests.
Write, clarify, or fix documentation.
Suggest or add new features.

Made At Wadhwani Institute for Artificial Intelligence

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.4

Nov 20, 2023

0.1.2

Sep 8, 2023

0.1.1

Sep 7, 2023

0.1.0

Sep 7, 2023

0.0.3

Sep 6, 2023

0.0.2

Sep 6, 2023

0.0.1

Sep 6, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jupdoc-0.1.4.tar.gz (12.5 kB view hashes)

Uploaded Nov 20, 2023 Source

Built Distribution

jupdoc-0.1.4-py3-none-any.whl (13.6 kB view hashes)

Uploaded Nov 20, 2023 Python 3

Hashes for jupdoc-0.1.4.tar.gz

Hashes for jupdoc-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`ed39fff8ec7aca1c4d2982af5321926b9461d4ce406002e30e62e78d1f9e7b59`
MD5	`85ff951f08a914736f3bf9d22ac28a75`
BLAKE2b-256	`d58061d9ec4f101db9dd48e2546b8d4d79a3636f7f3b055b82fdc882ad6bc26f`

Hashes for jupdoc-0.1.4-py3-none-any.whl

Hashes for jupdoc-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`07ebe8c1758771d2125de0b7a2561677d7c456752aac86736e88cf14f94156ed`
MD5	`1c944a258c22805d1cd336851977054d`
BLAKE2b-256	`238108d5721b0e789a7b8b10edebff5143e88b2020a16594cae647fcca4fff74`