Skip to main content

Tools for generating Stimela cab definitions from Python functions

Project description

hip-cargo

A guide to designing auto-documenting CLI interfaces using Typer + conversion utilities. If you are creating a new package the instructions below will guide you on how to structure it. The generate-function utility is available to assist in converting an existing package to the hip-cargo format but there will be some manual steps involved. The philosophy behind this design is to allow having a lightweight version of the package that only installs the bits required to generate --help from the CLI and the cab definitions that can then be used with stimela. The full package should be available as a container image that can be used with stimela. The image should be tagged with the package version so that stimela will automatically pull the image that matches the cab configuration.

Installation

pip install hip-cargo

Or for development:

git clone https://github.com/landmanbester/hip-cargo.git
cd hip-cargo
uv sync

Quick Start

1. Decorate your Python CLI

Something like the following goes in src/mypackage/cli/process.py

import typer
from pathlib import Path
from typing import NewType
from typing_extensions import Annotated
from hip_cargo import stimela_cab, stimela_output

# custom types (stimela has e.g. File, URI, MS and Directory)
File = NewType("File", Path)
URI = NewType("URI", Path)
MS = NewType("MS", Path)
Directory = NewType("Directory", Path)

@stimela_cab(
    name="my_processor",
    info="Process data files",
)
@stimela_output(
    name="output_file",
    dtype="File",
    info="{input_file}.processed",
    required=True,
)
def process(
    input_ms: Annotated[MS, typer.Argument(parser=MS, help="Input MS to process")],  # note the parser=MS bit. This is required for non-standard types
    output_dir: Annotated[Directory, typer.Option(parser=Directory, help="Output Directory for results")] = Path("./output"),
    threshold: Annotated[float, typer.Option(help="Threshold value")] = 0.5,
):
    """
    Process a data file.
    """
    # All your manual parameter wrangling here
    from mypackage.core.process import process as process_core
    return process_core(*args, **kwargs)

Note that *args and **kwargs need to passed explicitly. Then register the command in the src/mypackage/cli/__init__.py with something like the following

"""Lightweight CLI for mypackage."""

import typer

app = typer.Typer(
    name="mypackage",
    help="Scientific computing package",
    no_args_is_help=True,
)

# Register commands
from mypackage.cli.process import process

app.command(name="process")(process)

__all__ = ["app"]

That's it, if you have something like the following

[project.scripts]
mypackage = "mypackage.cli:app"

in your pyproject.toml you should now be able to run

app --help

and

app process --help

from the command line and have a beautifully formatted CLI for your package. Note that you can register multiple commands under app.

2. Generate the Stimela cab definition

If you have the CLI definition you can convert it to a can using e.g.

cargo generate-cab mypackage.process src/mypackage/cabs/process.yaml

This should be automated using scrips/generate_cabs.py, but the above command is useful for testing.

3. Generate Python function from existing cab (reverse)

If you are converting an existing package to the hip-cargo format there is a utility function available viz.

cargo generate-function /path/to/existing_cab.yaml -o myfunction.py

Currently, this won't add things like rich_output_panel, but it should help to get you started. The program should recognize custom types and add the

from pathlib import Path
from typing import NewType

MS = NewType("MS", Path)

bit for you. It should also add the parser=MS in the typer.Option() bit for you.

Project Structure for hip-cargo Packages

Packages following the hip-cargo pattern should be structured to enable both lightweight cab definitions and full execution environments:

my-scientific-package/
├── src/
│   └── mypackage/
│       ├── __init__.py
│       ├── utils/               # Utilities used by core algorithms
│       │   ├── __init__.py
│       │   └── operator.py
│       ├── core/                # Core implementations with standard python type hints (no Annotated or custom types)
│       │   ├── __init__.py
│       │   ├── process.py
│       │   └── analyze.py
│       ├── cli/                 # Lightweight CLI layer
│       │   ├── __init__.py      # Main Typer app
│       │   ├── process.py       # Individual commands
│       │   └── analyze.py
│       └── cabs/                # Generated cab definitions (inside mypackage)
│           ├── __init__.py
│           ├── process.yaml
│           └── analyze.yaml
├── scripts/
│   └── generate_cabs.py        # Automation script
├── Dockerfile                   # For containerization
├── pyproject.toml
└── README.md

Key Principles

  1. Separate CLI from implementation: Keep CLI modules lightweight with lazy imports. Keep them all in the src/mypackage/cli directory and define the CLI for each command in a separate file. Construct the main Typer app in src/mypackage/cli/__init__.py and register commands there.
  2. Separate cabs directory at same level as cli: Use hip-cargo to auto-generate cabs into in src/mypackage/cabs/ directory with the generate_cabs.py script. There should be a separate file for each cab.
  3. Single app, multiple commands: Use one Typer app that registers all commands. If you need a separate app you might as well create a separate repository for it.
  4. Lazy imports: Import heavy dependencies (NumPy, JAX, Dask) only when executing
  5. Linked GitHub package with container image: Maintain an up to date Dockerfile that installs the full package and use Docker (or Podman) to upload the image to the GitHub Container registry. Link this to your GitHub repository.

Example Structure

src/mypackage/cli/__init__.py:

"""Lightweight CLI for mypackage."""

import typer

app = typer.Typer(
    name="mypackage",
    help="Scientific computing package",
    no_args_is_help=True,
)

# Register commands
from mypackage.cli.process import process
from mypackage.cli.analyze import analyze

app.command(name="process")(process)
app.command(name="analyze")(analyze)

__all__ = ["app"]

src/mypackage/cli/process.py:

"""Process command - lightweight wrapper."""

from pathlib import Path
from typing import NewType
from typing_extensions import Annotated
import typer
from hip_cargo import stimela_cab, stimela_output

MS = NewType("MS", Path)

@stimela_cab(name="mypackage_process", info="Process data")
@stimela_output(name="output", dtype="File", info="{input_file}.out")
def process(
    input_ms: Annotated[MS, typer.Argument(parser=MS, help="Input File")],
    param: Annotated[float, typer.Option(help="Parameter")] = 1.0,
):
    """Process data files."""
    # Lazy import - only loaded when executing
    from mypackage.operators.core_algorithm import process_data

    return process_data(input_file, param)

pyproject.toml:

[project]
name = "mypackage"
dependencies = [
    "typer>=0.12.0",
    "hip-cargo>=0.1.0",
]

[project.optional-dependencies]
# Full scientific stack
full = [
    "numpy>=1.24.0",
    "jax>=0.4.0",
    # ... heavy dependencies
]

[project.scripts]
mypackage = "mypackage.cli:app"

scripts/generate_cabs.py:

"""Generate all cab definitions."""
import subprocess
from pathlib import Path

CLI_MODULES = [
    "mypackage.cli.process",
    "mypackage.cli.analyze",
]

CABS_DIR = Path("src/mypackage/cabs")
CABS_DIR.mkdir(exist_ok=True)

for module in CLI_MODULES:
    cmd_name = module.split(".")[-1]
    output = CABS_DIR / f"{cmd_name}.yaml"

    print(f"Generating {output}...")
    subprocess.run([
        "cargo", "generate-cab",
        module,
        str(output)
    ], check=True)

print("✓ All cabs generated")

Installation Modes

Users can install your package in different ways:

# Lightweight (just CLI and cab definitions)
pip install mypackage

# Full (with all scientific dependencies)
pip install mypackage[full]

# Development
pip install -e "mypackage[full,dev]"

Integration with cult-cargo

For integration with Stimela's cult-cargo:

  1. Make cabs discoverable:
# src/mypackage/cabs/__init__.py
from pathlib import Path

CAB_DIR = Path(__file__).parent
AVAILABLE_CABS = [p.stem for p in CAB_DIR.glob("*.yml")]

def get_cab_path(name: str) -> Path:
    """Get path to a cab definition."""
    return CAB_DIR / f"{name}.yml"
  1. cult-cargo imports lightweight version:

We have to decide whether we want to add this kind of thing to cult-cargo:

# In cult-cargo's pyproject.toml
[tool.poetry.dependencies]
mypackage = "^1.0.0"  # Not mypackage[full]

However, it should be possible to just

uv pip install mypackage==x.x.x

without any dependency conflicts. If not we have to think about ephemeral virtual environments.

  1. Users run with Stimela:
# Native: requires full installation
pip install mypackage[full]
stimela run recipe.yml

# Singularity: uses container (lightweight install sufficient)
pip install mypackage
stimela run recipe.yml -S

Container Images and GitHub Actions

For Stimela to use your package in containerized environments, you should publish OCI container images to GitHub Container Registry (ghcr.io). This section shows how to automate this with GitHub Actions.

1. Create a Dockerfile

Add a Dockerfile at the root of your repository:

FROM python:3.11-slim

WORKDIR /app

# Install uv for fast package installation
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv

# Copy package files
COPY pyproject.toml README.md ./
COPY src/ src/

# Install package with full dependencies using uv (much faster than pip)
RUN uv pip install --system --no-cache .

# Make CLI available
ENTRYPOINT ["mypackage"]
CMD ["--help"]

2. Set up GitHub Actions Workflow

Create .github/workflows/publish-container.yml:

name: Build and Publish Container

on:
  push:
    tags:
      - 'v*.*.*'  # Trigger on version tags (e.g., v1.0.0)
  workflow_dispatch:  # Allow manual triggering

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build-and-push:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    steps:
      - name: Checkout repository
        uses: actions/checkout@v5

      - name: Log in to Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata (tags, labels)
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=semver,pattern={{version}}
            type=semver,pattern={{major}}.{{minor}}
            type=semver,pattern={{major}}
            type=sha,prefix={{branch}}-

      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}

3. Link Container to GitHub Package

To associate the container image with your repository:

  1. Automatic linking: If your workflow pushes to ghcr.io/username/repository-name, GitHub automatically creates a package linked to the repository.

  2. Manual linking (if needed):

    • Go to your repository on GitHub
    • Navigate to the "Packages" section
    • Click on your container package
    • Click "Connect repository" in the sidebar
    • Select your repository from the dropdown
  3. Set package visibility:

    • In the package settings, set visibility to "Public" for open-source projects
    • This allows Stimela to pull images without authentication

4. Version Tagging Best Practices

The workflow above creates multiple tags for each release:

# For release v1.2.3, creates:
ghcr.io/username/mypackage:1.2.3    # Full version
ghcr.io/username/mypackage:1.2      # Minor version
ghcr.io/username/mypackage:1        # Major version
ghcr.io/username/mypackage:main-sha123456  # Branch + commit SHA

This allows users to pin to specific versions or track latest minor/major releases.

5. Triggering a Build

Automated (recommended):

# Create and push a version tag
git tag v1.0.0
git push origin v1.0.0

The GitHub Action will automatically build and publish the container.

Manual:

  • Go to "Actions" tab in GitHub
  • Select "Build and Publish Container"
  • Click "Run workflow"

6. Using the Container with Stimela

Once published, users can reference your container in Stimela recipes:

cabs:
  - name: mypackage
    image: ghcr.io/username/mypackage:1.0.0

Stimela will automatically pull the matching version based on the cab configuration.

7. Local Testing

Test your container locally before pushing:

# Build
docker build -t mypackage:test .

# Run
docker run --rm mypackage:test --help
docker run --rm mypackage:test process --help

# Test with mounted data
docker run --rm -v $(pwd)/data:/data mypackage:test process /data/input.ms

Type Inference

hip-cargo automatically recognizes custom stimela types. The generate-cab command should add

from pathlib import Path
from typing import NewType

MS = NewType("MS", Path)
Directory = NewType("Directory", Path)
URI = NewType("URI", Path)
File = NewType("File", Path)

to the preamble of functions generated from cabs that use these types. It should also add the parser bit to the type hint Annotation e.g. for the custom MS dtype we need

def process(input_ms: Annotated[MS, typer.Option(parser=MS)]):
    pass

One quirk of this approach is that parameters which have None as the default need to be defined as e.g.

def process(input_ms: Annotated[MS | None, typer.Option(parser=MS)]) = None:
    pass

Python then parses this as Optional[MS] which is just an alias for Union[MS | None]. This should be handled correctly such that the generate-cab command places dtype: MS in the cab definition and the generate-function command correctly generates the function signature above. These custom types are currently limited to only two possible types in the Union and should be specified using the newer dtype1 | dtype2 format in the function definition (one of which may be None). All standard python types should just work.

Decorators

@stimela_cab

Marks a function as a Stimela cab.

  • name: Cab name
  • info: Description
  • policies: Optional dict of cab-level policies

@stimela_output

Defines a stimela output. When defining functions from cabs the generate-function command should check for the following parameter fields

  • name: Output name (top level, one below cabs)
  • dtype: Data type (File, Directory, MS, etc.)
  • info: Help string
  • required: Whether output is required (default: False)
  • implicit: If implicit is True the parameter should not be placed in the function definition. If implicit is False (the default), the parameter needs to be added to the function signature.

Features

  • ✅ Automatic type inference from Python type hints
  • ✅ Support for Typer Arguments (positional) and Options
  • ✅ Multiple outputs automatically added to function signature if they are not implicit
  • ✅ List types with automatic repeat: list policy
  • ✅ Proper handling of default values and required parameters

Development

This project uses:

  • uv for dependency management
  • ruff for linting and formatting
  • typer for the CLI

Setting Up Development Environment

# Clone the repository
git clone https://github.com/landmanbester/hip-cargo.git
cd hip-cargo

# Install dependencies with development tools
uv sync --group dev

# Install pre-commit hooks (recommended)
uv run pre-commit install

Pre-commit Hooks

This project uses pre-commit to automatically check code quality before commits. The hooks run:

  • ruff linting: Checks code style and catches common errors
  • ruff formatting: Ensures consistent code formatting
  • trailing whitespace: Removes trailing whitespace
  • end-of-file-fixer: Ensures files end with a newline
  • check-yaml: Validates YAML syntax
  • check-toml: Validates TOML syntax
  • check-merge-conflict: Prevents committing merge conflict markers
  • check-added-large-files: Prevents accidentally committing large files

Installing Pre-commit Hooks

After cloning the repository, install the pre-commit hooks:

uv run pre-commit install

This will automatically run the hooks before each commit. If any checks fail, the commit will be blocked until you fix the issues.

Running Hooks Manually

You can run the hooks manually on all files:

# Run on all files
uv run pre-commit run --all-files

# Run on staged files only
uv run pre-commit run

Updating Hook Versions

To update hook versions to the latest:

uv run pre-commit autoupdate

Manual Code Quality Checks

If you prefer to run checks manually without pre-commit:

# Format code
uv run ruff format .

# Check and auto-fix linting issues
uv run ruff check . --fix

# Run tests
uv run pytest -v

# Run tests with coverage
uv run pytest --cov=hip_cargo --cov-report=term-missing

Contributing Workflow

  1. Create a feature branch:

    git checkout -b feature/your-feature-name
    
  2. Make your changes and ensure tests pass:

    uv run pytest -v
    
  3. Format and lint (automatically done by pre-commit):

    git add .
    git commit -m "feat: your feature description"
    # Pre-commit hooks run automatically
    
  4. Push and create a pull request:

    git push origin feature/your-feature-name
    

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hip_cargo-0.0.2.tar.gz (20.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hip_cargo-0.0.2-py3-none-any.whl (26.7 kB view details)

Uploaded Python 3

File details

Details for the file hip_cargo-0.0.2.tar.gz.

File metadata

  • Download URL: hip_cargo-0.0.2.tar.gz
  • Upload date:
  • Size: 20.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for hip_cargo-0.0.2.tar.gz
Algorithm Hash digest
SHA256 61a93c48753244ad72a26b990d91dd69f10d9ad876f08c4dc281b3101cffa2eb
MD5 0dbcd7cc1aafd0686840b825965616c0
BLAKE2b-256 8f9a03f083ca1867a982f238ec04efea43a57a7fc7535c4d3f87d23491becad8

See more details on using hashes here.

File details

Details for the file hip_cargo-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: hip_cargo-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 26.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for hip_cargo-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5e54eaad690e1190ac7819023574468f6e7b4502c4a259e03c04e7a57fe508c1
MD5 719498f907b478210209eecdc60d0d88
BLAKE2b-256 7fc8acb1594e7aaa203ae5bd4d6a8c62bfef8e84fde4988380c3ac0648013909

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page