A collection of tools for building structured Python projects
Project description
Cocina
Cocina is a collection of tools for building structured Python projects. It provides sophisticated configuration management, job execution capabilities, and a professional CLI interface.
Core Components
- ConfigHandler - Unified configuration management, constants, and environment variables
- ConfigArgs - Job-specific configuration loading with structured argument access
- CLI - Command-line interface for project initialization and job execution
Table of Contents
Getting Started
Install
FROM PYPI
pip install cocina
FROM CONDA
conda install -c conda-forge cocina
Initialize
pixi run cocina init --log_dir logs --package your_package_name
See cocina Configuration for detailed initialization options.
Overview
Cocina separates configuration (values that can change) from constants (values that never change) and job arguments (run-specific parameters).
Key Concepts
-
ConfigHandler (
ch) - Manages constants and project configuration- Constants:
your_module/constants.py(protected from modification) - General Config:
config/config.yaml - Env Config:
config/<environment-name>.yaml - Usage:
ch.DATABASE_URL,ch.get(MAX_SCALE, 1000)
- Constants:
-
ConfigArgs (
ca) - Manages job-specific run configurations- Job configs:
config/args/job_name.yaml - Usage: To run method
method_name:method_name(*ca.method_name.args, **ca.method_name.kwargs)
- Job configs:
Note: names of configuration and job directories and files can be customized in .cocina.
Before and After
Traditional approach:
SOURCE = "path/to/src.parquet"
OUTPUT_DEST = "path/to/output"
def main():
data = load_data(SOURCE, limit=1000, debug=True)
data = process_data(data, scale=100, validate=False)
save_data(data, OUTPUT_DEST, format="json")
if __name__ == "__main__":
main()
With Cocina:
def run(config_args):
data = load_data(*config_args.load_data.args, **config_args.load_data.kwargs)
data = process_data(data, *config_args.process_data.args, **config_args.process_data.kwargs)
save_data(data, *config_args.save_data.args, **config_args.save_data.kwargs)
All parameters are now externalized to YAML configuration files, making scripts reusable and maintainable. CLI mangagement/arg-parsing is handled through the cocina CLI
Example
Project Structure:
my_project/
├── my_package/ # Python package
│ ├── constants.py # Project Constants (protected from modification)
│ ├── ... # Modules
│ └── data_manager.py # Named example python module
├── config/
│ ├── config.yaml # Main configuration
│ ├── prod.yaml # Production configuration overrides
│ └── args/
│ └── data_pipeline.yaml # Job configuration
└── jobs/
└── data_pipeline.py # Job implementation
Configuration (config/args/data_pipeline.yaml):
extract_data:
args: ["source_table"]
kwargs:
limit: 1000
debug: false
transform_data:
scale: 100
validate: true
save_data:
- "output_table"
Job Implementation (jobs/data_pipeline.py):
def run(config_args, printer=None):
data = extract_data(*config_args.extract_data.args, **config_args.extract_data.kwargs)
data = transform_data(data, *config_args.transform_data.args, **config_args.transform_data.kwargs)
save_data(*config_args.save_data.args, **config_args.save_data.kwargs)
Running Jobs:
# Default environment
pixi run cocina job data_pipeline
# Production environment
pixi run cocina job data_pipeline --env prod
RUN AND MAIN METHODS
When running a job, the CLI requires either a run method that takes arguments config_args: ConfigArgs, printer: Printer, or a run method that takes only config_args: ConfigArgs, or a main method that does not have any arguments.
Priority ordering is:
run(config_args, printer)| passing both aConfigArgsandPrinterinstancerun(config_args)| passing aConfigArgsinstancemain()| for jobs without configuration (legacy scripts)
USER CODEBASE/NOTEBOOKS
Although the main focus is on building and running configured "jobs", ConfigArgs can also be used in your code (a notebook for example):
# Load job-specific configuration
ca = ConfigArgs('job_group_1.job_a1')
jobs.job_group_1.job_a1.step_1(*ca.step_1.args, **ca.step_1.kwargs)
cocina Configuration
The .cocina file contains project settings and must be in your project root. It defines:
- Configuration file locations and naming conventions
- Project root directory location
- Environment variable names
Required: Every project must have a .cocina file at the root.
Options:
--log_dir: Enable automatic log file creation--package: Specify main package for constants loading--force: Overwrite existing.cocinafile
Configuration Files
Cocina uses YAML files in the config/ directory:
config/
├── config.yaml # Main configuration
├── dev.yaml # Development environment overrides
├── prod.yaml # Production environment overrides
└── args/ # Job-specific configurations
├── job_name.yaml # Individual job config
└── group_name/ # Grouped job configs
└── job_a.yaml
Configuration Types:
- Main Config:
config.yaml- shared across all environments - Environment Config:
{env}.yaml- environment-specific overrides - Job Config:
args/{job}.yaml- job-specific parameters and arguments
ConfigHandler
Manages constants and main configuration with environment support.
from cocina.config_handler import ConfigHandler
ch = ConfigHandler()
print(ch.DATABASE_URL) # From config.yaml
print(ch.MAX_SCALE) # From constants.py (protected)
Features:
- Loads constants from
your_package/constants.py - Loads configuration from
config/config.yaml - Environment-specific overrides from
config/{env}.yaml - Dict-style and attribute access patterns
ConfigArgs
Loads job-specific configurations with structured argument access.
from cocina.config_handler import ConfigArgs
ca = ConfigArgs('data_pipeline')
# Access method arguments
ca.extract_data.args # ["source_table"]
ca.extract_data.kwargs # {"limit": 1000, "debug": False}
YAML Configuration Parsing:
- Dict with
args/kwargskeys → extracts args and kwargs - Dict without special keys →
args=[],kwargs=dict - List/tuple →
args=value,kwargs={} - Single value →
args=[value],kwargs={}
Features:
- Environment-specific overrides
- Reference resolution from main config
- Dynamic value substitution
CLI
Initialize Project
pixi run cocina init --log_dir logs --package your_package
Run Jobs
# Run a single job
pixi run cocina job data_pipeline
# Run with alternative config filename
# - the above command loads config/args/data_pipeline.yaml
# - the command below loads config/args/data_pipeline/v2.yaml
pixi run cocina job data_pipeline:v2
# Run with specific environment
pixi run cocina job data_pipeline --env prod
# Run multiple jobs
pixi run cocina job job1 job2 job3
# Dry run (validate without executing)
pixi run cocina job data_pipeline --dry_run
Options:
--env: Environment configuration to use (dev, prod, etc.)--verbose: Enable detailed output--dry_run: Validate configuration without running
Tools
Printer
Professional output with timestamps, headers, and optional file logging. Printer is a singleton class that automatically initializes when first accessed.
from cocina.printer import Printer
printer = Printer(log_dir='logs', basename='MyApp')
printer.message('Status update', count=42, status='ok')
printer.stop('Complete')
Timer
Simple timing functionality with duration tracking.
from cocina.utils import Timer
timer = Timer()
timer.start() # Start timing
print(timer.state()) # Current elapsed time
print(timer.now()) # Current timestamp
stop_time = timer.stop() # Stop timing
print(timer.delta()) # Total duration string
See complete documentation for all utility functions and helpers.
Development
Requirements: Managed with Pixi - no manual environment setup needed.
# All commands use pixi
pixi run jupyter lab
Style: Follows PEP8 standards. See setup.cfg for project-specific rules.
Documentation
- Getting Started - Installation, initialization, and first job
- Configuration Guide - Complete configuration management
- Job System - Creating and running jobs
- CLI Reference - Command-line interface
- Examples - Detailed usage examples
- Advanced Topics - Complex patterns and extensions
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cocina-0.1.5.tar.gz.
File metadata
- Download URL: cocina-0.1.5.tar.gz
- Upload date:
- Size: 26.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a97d6f7a6b22378468f0adb4554420669624b41c7e4d44838f845b79a2c1bdc4
|
|
| MD5 |
c3782ab75414aad4411ff43e3b27c607
|
|
| BLAKE2b-256 |
61d074f46ca492c898d688a12003bd38bec5d83bd2b28a2f3454d94be9947a4a
|
File details
Details for the file cocina-0.1.5-py3-none-any.whl.
File metadata
- Download URL: cocina-0.1.5-py3-none-any.whl
- Upload date:
- Size: 24.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3101be6a8d86979bc279a55c8e612831ae41ca901eeef8ba01bbf4e723106ab
|
|
| MD5 |
93fba2a6327cb6d13b38872dc6c6fa03
|
|
| BLAKE2b-256 |
6a91247b1a1a30696aa51de6ba8523864fe4b6f37b971324fc43cf0e2e2723f9
|