Skip to main content

An open-source handbook of applied guidance and tools for sustainable software development and maintenance.

Project description

The Software Gardening Almanack

PyPI - Version Build Status Coverage Status Software DOI badge

The Software Gardening Almanack is an open-source handbook of applied guidance and tools for sustainable software development and maintenance.

The project entails two primary components:

  • The Almanack handbook: the content found here helps educate, demonstrate, and evolve the concepts of sustainable software development.
  • The almanack package: is a Python package which implements the concepts of the book to help improve software sustainability by generating organized metrics and running linting checks on repositories. The Python package may also be used as a pre-commit hook to check repositories for best practices.

Please see our pavilion section of the book for presentations and other related materials for the Almanack.

Handbook

Package

Install

You can install the Almanack with the following:

# install from pypi
pip install almanack

# install directly from source
pip install git+https://github.com/software-gardening/almanack.git

Once installed, the Almanack can be used to analyze repositories for sustainable development practices. Output from the Almanack includes metrics which are defined through metrics.yml as a Python dictionary (JSON-compatible) record structure.

Command-line Interface (CLI)

You can use the Almanack package as a command-line interface (CLI):

# generate a table of metrics based on a repository
almanack table path/to/repository

# perform linting-style checks on a repository
almanack check path/to/repository

# exclude paths (comma-separated)
almanack check path/to/repository --exclude_paths=tests,src/book/_build

# enable debug logging
almanack check path/to/repository --verbose

Pre-commit Hook

We provide pre-commit hooks to enable you to run the Almanack as part of your automated checks for developing software projects. Add the following to your pre-commit-config.yaml in order to use the Almanack.

For example:

# include this in your pre-commit-config.yaml
- repo: https://github.com/software-gardening/almanack
    rev: v0.1.1
    hooks:
    - id: almanack-check

Python API

You can also use the Almanack through a Python API:

For example:

import almanack
import pandas as pd

# gather the almanack table using the almanack repo as a reference
almanack_table = almanack.table("path/to/repository")

# show the almanack table as a Pandas DataFrame
pd.DataFrame(almanack_table)

Example notebook

Please see this example notebook which demonstrates using the Almanack package.

Batch processing many repositories

The almanack batch command runs the almanack check across many repositories in parallel and writes one parquet file (or one per batch) while optionally streaming progress to stdout.

# Run from a list (comma-separated) and write a single parquet
almanack batch results.parquet --repo_urls https://github.com/org/repo1,https://github.com/org/repo2 --max_workers 8

# Use threads (good for I/O-bound workloads) and split outputs per batch
almanack batch out_dir --repo_urls https://github.com/org/repo1,https://github.com/org/repo2 --executor thread --split_batches --batch_size 100

# Read repo URLs from a column in a provided parquet file
almanack batch results.parquet --parquet_path links.parquet --column github_link

Key options:

  • --executor: process (default) or thread
  • --batch_size: how many repos per batch (a small multiple of max_workers works well)
  • --split_batches: an option to write one parquet file per batch into output_path (treated as a directory)
  • --collect_dataframe: set to False to avoid returning a dataframe (only write to file)
  • --show_repo_progress: shows progress per repository
  • --show_batch_progress: shows progress per batch (sets of repos)
  • --show_errors: emit any errors from the almanack processing

Python API example:

from concurrent.futures import ThreadPoolExecutor
from almanack import process_repositories_batch

repos = ["https://github.com/org/repo1", "https://github.com/org/repo2"]

# Single parquet
df = process_repositories_batch(
    repos,
    output_path="almanack_results.parquet",
    max_workers=8,
    executor_cls=ThreadPoolExecutor,  # threads are notebook-friendly / I/O-friendly
)

# Per-batch files, no in-memory DataFrame
process_repositories_batch(
    repos,
    output_path="batch_outputs",
    split_batches=True,
    collect_dataframe=False,
    batch_size=100,
    max_workers=16,
)

GitHub API performance

The Almanack uses GitHub’s API to gather certain metrics. Anonymous API requests have extremely low rate limits—once hit, requests are throttled and batch jobs slow down. Export a personal access token as GITHUB_TOKEN before running any CLI or Python workflows to raise the per-hour quota:

export GITHUB_TOKEN=ghp_yourtokenhere

Commands launched from the same shell automatically reuse the token, so your GitHub requests complete faster and more reliably.

Contributing

Please see our CONTRIBUTING.md document for more information on how to contribute to this project.

Acknowledgements

This work was supported by the Better Scientific Software Fellowship Program, a collaborative effort of the U.S. Department of Energy (DOE), Office of Advanced Scientific Research via ANL under Contract DE-AC02-06CH11357 and the National Nuclear Security Administration Advanced Simulation and Computing Program via LLNL under Contract DE-AC52-07NA27344; and by the National Science Foundation (NSF) via SHI under Grant No. 2327079.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

almanack-0.1.15.tar.gz (70.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

almanack-0.1.15-py3-none-any.whl (47.8 kB view details)

Uploaded Python 3

File details

Details for the file almanack-0.1.15.tar.gz.

File metadata

  • Download URL: almanack-0.1.15.tar.gz
  • Upload date:
  • Size: 70.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for almanack-0.1.15.tar.gz
Algorithm Hash digest
SHA256 3342e149b64b4fecbd72b15eb72b5376640d60e4ca3dcb902289f293bb68ed0c
MD5 7e1f111b6b0b8ef8109dc7092c43643f
BLAKE2b-256 a9f9dd6c1589eac44cd1114a3c81e86ad1e38a4bb6607b7fe40ba984e4304a71

See more details on using hashes here.

Provenance

The following attestation bundles were made for almanack-0.1.15.tar.gz:

Publisher: publish-pypi.yml on software-gardening/almanack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file almanack-0.1.15-py3-none-any.whl.

File metadata

  • Download URL: almanack-0.1.15-py3-none-any.whl
  • Upload date:
  • Size: 47.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for almanack-0.1.15-py3-none-any.whl
Algorithm Hash digest
SHA256 ac358363458a031dbcd0031303e845c35f900fcea12750196e4849c9e2ee4fb0
MD5 3cf5428cc5f9b1a723f20c6f8f11032b
BLAKE2b-256 05dde7bb866b690b7845c82c98b55cab055819dfef3564c19be3e084f1309c6d

See more details on using hashes here.

Provenance

The following attestation bundles were made for almanack-0.1.15-py3-none-any.whl:

Publisher: publish-pypi.yml on software-gardening/almanack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page