A collection of tools for building structured Python projects

These details have not been verified by PyPI

Project description

Cocina

Cocina is a collection of tools for building structured Python projects. It provides sophisticated configuration management, job execution capabilities, and a professional CLI interface.

Core Components

ConfigHandler - Unified configuration management, constants, and environment variables
ConfigArgs - Job-specific configuration loading with structured argument access
CLI - Command-line interface for project initialization and job execution

Getting Started
cocina Configuration
Configuration Files
- ConfigHandler
- ConfigArgs
CLI
- Initialize Project
- Run Jobs
Tools
- Printer
- Timer
Development
Documentation

Getting Started

Install

FROM PYPI

pip install cocina

FROM CONDA

 conda install -c conda-forge cocina

Initialize

pixi run cocina init --log_dir logs --package your_package_name

See cocina Configuration for detailed initialization options.

Overview

Cocina separates configuration (values that can change) from constants (values that never change) and job arguments (run-specific parameters).

Key Concepts

ConfigHandler (ch) - Manages constants and project configuration
- Constants: your_module/constants.py (protected from modification)
- General Config: config/config.yaml
- Env Config: config/<environment-name>.yaml
- Usage: ch.DATABASE_URL, ch.get(MAX_SCALE, 1000)
ConfigArgs (ca) - Manages job-specific run configurations
- Job configs: config/args/job_name.yaml
- Usage: To run method method_name: method_name(*ca.method_name.args, **ca.method_name.kwargs)

Note: names of configuration and job directories and files can be customized in .cocina.

Before and After

Traditional approach:

SOURCE = "path/to/src.parquet"
OUTPUT_DEST = "path/to/output"

def main():
    data = load_data(SOURCE, limit=1000, debug=True)
    data = process_data(data, scale=100, validate=False)
    save_data(data, OUTPUT_DEST, format="json")

if __name__ == "__main__":
    main()

With Cocina:

def run(config_args):
    data = load_data(*config_args.load_data.args, **config_args.load_data.kwargs)
    data = process_data(data, *config_args.process_data.args, **config_args.process_data.kwargs)
    save_data(data, *config_args.save_data.args, **config_args.save_data.kwargs)

All parameters are now externalized to YAML configuration files, making scripts reusable and maintainable. CLI mangagement/arg-parsing is handled through the cocina CLI

Example

Project Structure:

my_project/
├── my_package/                 # Python package
│   ├── constants.py            # Project Constants (protected from modification)
│   ├── ...                     # Modules
│   └── data_manager.py         # Named example python module
├── config/
│   ├── config.yaml             # Main configuration
│   ├── prod.yaml               # Production configuration overrides
│   └── args/
│       └── data_pipeline.yaml  # Job configuration
└── jobs/
    └── data_pipeline.py        # Job implementation

Configuration (config/args/data_pipeline.yaml):

extract_data:
  args: ["source_table"]
  kwargs:
    limit: 1000
    debug: false

transform_data:
  scale: 100
  validate: true

save_data:
  - "output_table"

Job Implementation (jobs/data_pipeline.py):

def run(config_args, printer=None):
    data = extract_data(*config_args.extract_data.args, **config_args.extract_data.kwargs)
    data = transform_data(data, *config_args.transform_data.args, **config_args.transform_data.kwargs)
    save_data(*config_args.save_data.args, **config_args.save_data.kwargs)

Running Jobs:

# Default environment
pixi run cocina job data_pipeline

# Production environment
pixi run cocina job data_pipeline --env prod

RUN AND MAIN METHODS

When running a job, the CLI requires either a run method that takes arguments config_args: ConfigArgs, printer: Printer, or a run method that takes only config_args: ConfigArgs, or a main method that does not have any arguments.

Priority ordering is:

run(config_args, printer) | passing both a ConfigArgs and Printer instance
run(config_args) | passing a ConfigArgs instance
main() | for jobs without configuration (legacy scripts)

USER CODEBASE/NOTEBOOKS

Although the main focus is on building and running configured "jobs", ConfigArgs can also be used in your code (a notebook for example):

# Load job-specific configuration
ca = ConfigArgs('job_group_1.job_a1')
jobs.job_group_1.job_a1.step_1(*ca.step_1.args, **ca.step_1.kwargs)

cocina Configuration

The .cocina file contains project settings and must be in your project root. It defines:

Configuration file locations and naming conventions
Project root directory location
Environment variable names

Required: Every project must have a .cocina file at the root.

Options:

--log_dir: Enable automatic log file creation
--package: Specify main package for constants loading
--force: Overwrite existing .cocina file

Configuration Files

Cocina uses YAML files in the config/ directory:

config/
├── config.yaml           # Main configuration
├── dev.yaml             # Development environment overrides
├── prod.yaml            # Production environment overrides
└── args/                # Job-specific configurations
    ├── job_name.yaml    # Individual job config
    └── group_name/      # Grouped job configs
        └── job_a.yaml

Configuration Types:

Main Config: config.yaml - shared across all environments
Environment Config: {env}.yaml - environment-specific overrides
Job Config: args/{job}.yaml - job-specific parameters and arguments

ConfigHandler

Manages constants and main configuration with environment support.

from cocina.config_handler import ConfigHandler

ch = ConfigHandler()
print(ch.DATABASE_URL)  # From config.yaml
print(ch.MAX_SCALE)     # From constants.py (protected)

Features:

Loads constants from your_package/constants.py
Loads configuration from config/config.yaml
Environment-specific overrides from config/{env}.yaml
Dict-style and attribute access patterns

ConfigArgs

Loads job-specific configurations with structured argument access.

from cocina.config_handler import ConfigArgs

ca = ConfigArgs('data_pipeline')
# Access method arguments
ca.extract_data.args     # ["source_table"]
ca.extract_data.kwargs   # {"limit": 1000, "debug": False}

YAML Configuration Parsing:

Dict with args/kwargs keys → extracts args and kwargs
Dict without special keys → args=[], kwargs=dict
List/tuple → args=value, kwargs={}
Single value → args=[value], kwargs={}

Features:

Environment-specific overrides
Reference resolution from main config
Dynamic value substitution

CLI

Initialize Project

pixi run cocina init --log_dir logs --package your_package

Run Jobs

# Run a single job
pixi run cocina job data_pipeline

# Run with alternative config filename
# - the above command loads config/args/data_pipeline.yaml
# - the command below loads config/args/data_pipeline/v2.yaml
pixi run cocina job data_pipeline:v2

# Run with specific environment
pixi run cocina job data_pipeline --env prod

# Run multiple jobs
pixi run cocina job job1 job2 job3

# Dry run (validate without executing)
pixi run cocina job data_pipeline --dry_run

Options:

--env: Environment configuration to use (dev, prod, etc.)
--verbose: Enable detailed output
--dry_run: Validate configuration without running

Tools

Printer

Professional output with timestamps, headers, and optional file logging. Printer is a singleton class that automatically initializes when first accessed.

from cocina.printer import Printer

printer = Printer(log_dir='logs', basename='MyApp')
printer.message('Status update', count=42, status='ok')
printer.stop('Complete')

Timer

Simple timing functionality with duration tracking.

from cocina.utils import Timer

timer = Timer()
timer.start()           # Start timing
print(timer.state())    # Current elapsed time
print(timer.now())      # Current timestamp
stop_time = timer.stop()     # Stop timing
print(timer.delta())    # Total duration string

See complete documentation for all utility functions and helpers.

Development

Requirements: Managed with Pixi - no manual environment setup needed.

# All commands use pixi
pixi run jupyter lab

Style: Follows PEP8 standards. See setup.cfg for project-specific rules.

Documentation

Getting Started - Installation, initialization, and first job
Configuration Guide - Complete configuration management
Job System - Creating and running jobs
CLI Reference - Command-line interface
Examples - Detailed usage examples
Advanced Topics - Complex patterns and extensions

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.5

Apr 8, 2026

0.1.4

Apr 8, 2026

0.1.3

Mar 31, 2026

0.1.2

Mar 26, 2026

0.1.1

Mar 26, 2026

0.1.0

Mar 16, 2026

0.0.9

Feb 17, 2026

0.0.8

Oct 24, 2025

0.0.7

Oct 21, 2025

0.0.6

Oct 17, 2025

0.0.5

Oct 15, 2025

0.0.4

Oct 13, 2025

0.0.3

Oct 12, 2025

0.0.2

Oct 11, 2025

0.0.1

Oct 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocina-0.1.5.tar.gz (26.2 kB view details)

Uploaded Apr 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cocina-0.1.5-py3-none-any.whl (24.9 kB view details)

Uploaded Apr 8, 2026 Python 3

File details

Details for the file cocina-0.1.5.tar.gz.

File metadata

Download URL: cocina-0.1.5.tar.gz
Upload date: Apr 8, 2026
Size: 26.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for cocina-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`a97d6f7a6b22378468f0adb4554420669624b41c7e4d44838f845b79a2c1bdc4`
MD5	`c3782ab75414aad4411ff43e3b27c607`
BLAKE2b-256	`61d074f46ca492c898d688a12003bd38bec5d83bd2b28a2f3454d94be9947a4a`

See more details on using hashes here.

File details

Details for the file cocina-0.1.5-py3-none-any.whl.

File metadata

Download URL: cocina-0.1.5-py3-none-any.whl
Upload date: Apr 8, 2026
Size: 24.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for cocina-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d3101be6a8d86979bc279a55c8e612831ae41ca901eeef8ba01bbf4e723106ab`
MD5	`93fba2a6327cb6d13b38872dc6c6fa03`
BLAKE2b-256	`6a91247b1a1a30696aa51de6ba8523864fe4b6f37b971324fc43cf0e2e2723f9`

See more details on using hashes here.

cocina 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Cocina

Core Components

Table of Contents

Getting Started

Install

Initialize

Overview

Key Concepts

Before and After

Example

RUN AND MAIN METHODS

USER CODEBASE/NOTEBOOKS

cocina Configuration

Configuration Files

ConfigHandler

ConfigArgs

CLI

Initialize Project

Run Jobs

Tools

Printer

Timer

Development

Documentation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes