Tools for studying notebooks
Project description
nbstudy - tools for studying notebooks
Table of Contents
Overview
nbstudy
is a collection of tools for studying notebooks, especially those published on GitHub.
It generalizes the tooling used in Exploration and Explanation in Computational Notebooks by Adam Rule et al, with additions for Refactoring in Computational Notebooks by the author of this tool (Dylan Lukes, Eric Liu, et al).
The goal of this project is to codify the functionality used to support these studies and future studies in a way that they can be used reproducibly by anyone else who wants to study notebooks in the wild.
⚠️️Warning: This tool is still in early development. The API is not stable, and the tool is not yet feature-complete. Many features are still missing (have not been cleaned up and copied over from existing code used priorly for publications), and the documentation is incomplete.
Requirements
nbstudy
requires Python 3.12 or later, and a recent version of Git 2.43 with support for
sparse-checkout and
partial-clone.
Installation
To install globally from PyPI, from anywhere run:
pip install nbstudy
nbstudy -h
To develop, clone the repository and then from the root of the repository run:
hatch shell
nbstudy -h
Introduction
Workspaces
nbstudy
works on the principle of a "workspace" in which notebooks are studied. A workspace is a Git repository
which contains a local cache of notebooks (as sparsely checked-out submodules) as well a database of metadata about
those notebooks used to maintain the cache, supported by settings in a configuration file and environment variables.
While some tools work in isolation without a workspace, by using one you get the benefits of being able to automate the process of studying collections of notebooks, collecting results interactively as you go. For example: if you wanted to do coding of every commit of each notebook.
A workspace looks like this:
my-nbstudy-workspace/
├── .gitignore
├── .gitmodules
├── nbstudy.config.json
├── nbstudy.config.env
├── nbstudy.db
└── nbcache/
├── localhost/
│ └── my-repo/
│
├── github.com/
┆ ├── user1/
┆ ├── repo1/
┆ ├── .../notebook1.ipynb
└── .../notebook2.ipynb
Workspace Settings
The nbstudy.config.json
file contains settings for the workspace. Settings may also be configured
using environment variables prefixed with NBSTUDY_
, or read from the nbstudy.config.env
file.
⚠️ The
nbstudy.config.json
file is intended to be shared with others, and should not contain any sensitive information. Thenbstudy.config.env
file is intended to be private, and should contain sensitive information such as API keys. It is by default included in the.gitignore
file.
Workpace Database
The nbstudy.db
file is a SQLite database containing metadata about the notebooks in the workspace,
and is managed by nbstudy
. It is not intended to be edited manually, though it can be inspected.
Workspace Notebook Cache
The nbcache/
directory contains the notebooks themselves, organized by the hostname of the Git
repository they came from, followed by the username and repository name of the repository.
There are two cases that are specially managed by nbstudy
:
The localhost/
directory is used for notebooks with no provenance (e.g. notebooks that are created
locally and not in a Git repository), and is managed by the nbstudy local
subcommands.
The github.com/
directory is used for notebooks that are scraped from a Git repository hosted
on GitHub and is managed by the nbstudy github
subcommands.
⚠️ Notebook caches can grow very large, as repositories that are fully downloaded can include huge data files.
nbstudy
does make a best faith effort to minimize bloat by using the sparse-checkout and partial-clone features of Git to only download the minimum files that are needed (notebook files themselves) by default.
Usage
The nbstudy
tool provides a number of subcommands for working with notebooks.
🛠️ TODO
Attribution
If you use nbstudy
in your research, please cite it as follows:
@misc{nbstudy,
author = {Lukes, Dylan},
title = {nbstudy - tools for studying notebooks},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/DylanLukes/nbstudy}},
commit = {<commit hash here>}
}
License
nbstudy
is distributed under the terms of the BSD-3-Clause license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nbstudy-0.0.1a2.tar.gz
.
File metadata
- Download URL: nbstudy-0.0.1a2.tar.gz
- Upload date:
- Size: 8.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 551852ca3295a1428b0e5e01ca06acbe0b8dbfdfe8bab499e02c3502bb613dbf |
|
MD5 | 0d333eb79f14ac5f3d8dc0d0d7d81422 |
|
BLAKE2b-256 | a27d304d08d2839cbffd6b38f514e8a831adbaf5df2c2096be1f09691c233900 |
File details
Details for the file nbstudy-0.0.1a2-py3-none-any.whl
.
File metadata
- Download URL: nbstudy-0.0.1a2-py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 078bc8689fc9e1a40a63693b3308bef3c2a14735db93c05384f1cf32fe78b8e1 |
|
MD5 | 68ff995f3c85e37c20d30892d6c372ff |
|
BLAKE2b-256 | 97e45bccd5c643c41457b14f8ee2cc2233fd826627a890517aa56b264b1a498b |