Concise Logic and Explanation Analysis Reports (CLEAR) for ML models
Project description
CLEAR: Concise Logic and Explanation Analysis Reports
CLEAR is a Python tool for generating model cards and risk reports from machine learning evaluation outputs. It transforms structured evaluation data into readable, standardized documentation that can be shared with stakeholders.
What This Tool Does
CLEAR takes ML evaluation metrics and metadata as input and generates:
- Model Cards: Standardized documentation including model overview, intended use, dataset summaries, performance metrics, limitations, and ethical considerations.
- Risk Reports: Analysis of identified risks, their mitigation strategies, and severity levels.
- Markdown Output: Human-readable markdown files suitable for documentation repositories or sharing with teams.
The tool processes JSON or YAML input files and applies templating to produce consistent, well-structured reports.
What It Does NOT Do
CLEAR does not:
- Automatically compute ML metrics. You must provide pre-calculated evaluation results.
- Generate visualizations or charts. Output is text-based markdown only.
- Store or manage model artifacts, checkpoints, or weights.
- Connect to external APIs, cloud services, or model registries.
- Enforce particular ML frameworks or tooling choices.
- Make judgment calls about model safety or regulatory compliance. It documents what you provide.
Installation
Install from PyPI:
pip install modelcardgen
Or install from source with development dependencies:
git clone https://github.com/ghostcipher1/modelcardgen.git
cd modelcardgen
pip install -e ".[dev]"
Requires Python 3.10 or later.
CLI Usage Examples
Generate a model card
modelcardgen generate --metrics evaluation.json --output-dir .
Using YAML input
modelcardgen generate --metrics metrics.yaml --output-dir ./reports
Validate metrics file
modelcardgen validate --metrics evaluation.json
View help
modelcardgen --help
modelcardgen generate --help
Input Data Schema
The tool expects JSON or YAML input files containing model metadata, dataset information, evaluation metrics, and risk assessments. Below is the complete input schema specification.
Required Top-Level Fields
model_name: string # Name of the model
model_version: string # Semantic version (e.g., "1.0.0")
model_description: string # High-level overview
model_owner: string # Person or team responsible
model_license: string # License type (e.g., "Apache-2.0")
model_framework: string # ML framework used (e.g., "scikit-learn")
accuracy: float # 0.0 to 1.0
precision: float # 0.0 to 1.0
recall: float # 0.0 to 1.0
f1_score: float # 0.0 to 1.0
Optional Fields
model_release_date: YYYY-MM-DD # Model release date (defaults to today)
roc_auc: float # 0.0 to 1.0 (optional)
confusion_matrix: [[int]] # 2D array of prediction counts (optional)
custom_metrics: {} # Dictionary of domain-specific metrics (optional)
training_data_name: string
training_data_description: string
training_data_size: integer # Number of samples
training_data_features: [string] # List of feature names
training_data_target: string # Target variable name
training_data_source_url: url # Optional URL to dataset source
eval_data_name: string
eval_data_description: string
eval_data_size: integer
eval_data_features: [string]
eval_data_target: string
eval_data_source_url: url
unsuitable_inputs: [string] # List of input types where model fails
environmental_constraints: string # Hardware/software requirements
out_of_scope_uses: [string] # Scenarios to avoid
intended_users: [string] # Target audience personas
intended_use_cases: [string] # Specific tasks designed for
prohibited_uses: [string] # Forbidden uses (ethical/legal)
Risks Array (Optional)
risks:
- risk_type: string # Category (e.g., "Data Bias")
description: string # Detailed explanation
mitigation_strategy: string # Mitigation approach
severity: string # "Low", "Medium", or "High"
Complete JSON Example
{
"model_name": "Email Spam Classifier",
"model_version": "2.1.0",
"model_description": "Classifies emails as spam or legitimate",
"model_owner": "ML Team",
"model_license": "Apache-2.0",
"model_framework": "scikit-learn",
"accuracy": 0.963,
"precision": 0.951,
"recall": 0.945,
"f1_score": 0.948,
"roc_auc": 0.985,
"training_data_name": "Enron Email Corpus",
"training_data_description": "Real email messages with labels",
"training_data_size": 755000,
"training_data_features": ["subject_line", "body_text"],
"training_data_target": "spam_label",
"eval_data_name": "Recent Email Dataset",
"eval_data_description": "Holdout test set",
"eval_data_size": 50000,
"eval_data_features": ["subject_line", "body_text"],
"eval_data_target": "spam_label",
"unsuitable_inputs": ["Non-English emails", "Encrypted content"],
"out_of_scope_uses": ["Real-time filtering without review"],
"intended_users": ["Email administrators", "IT security teams"],
"intended_use_cases": ["Spam detection"],
"prohibited_uses": ["Discriminatory filtering"],
"risks": [
{
"risk_type": "Data Distribution Shift",
"description": "Production data may differ from training",
"mitigation_strategy": "Monitor metrics in production",
"severity": "Medium"
}
]
}
Complete YAML Example
model_name: Email Spam Classifier
model_version: 2.1.0
model_description: Classifies emails as spam or legitimate
model_owner: ML Team
model_license: Apache-2.0
model_framework: scikit-learn
accuracy: 0.963
precision: 0.951
recall: 0.945
f1_score: 0.948
roc_auc: 0.985
training_data_name: Enron Email Corpus
training_data_description: Real email messages with labels
training_data_size: 755000
training_data_features:
- subject_line
- body_text
training_data_target: spam_label
eval_data_name: Recent Email Dataset
eval_data_description: Holdout test set
eval_data_size: 50000
eval_data_features:
- subject_line
- body_text
eval_data_target: spam_label
unsuitable_inputs:
- Non-English emails
- Encrypted content
out_of_scope_uses:
- Real-time filtering without review
intended_users:
- Email administrators
- IT security teams
intended_use_cases:
- Spam detection
prohibited_uses:
- Discriminatory filtering
risks:
- risk_type: Data Distribution Shift
description: Production data may differ from training
mitigation_strategy: Monitor metrics in production
severity: Medium
Validation Rules
- Metrics values (accuracy, precision, recall, f1_score, roc_auc) must be between 0.0 and 1.0
- All model_ fields* are required
- All training_data_ and eval_data_ fields are required** except source_url (optional)
- All metrics fields (accuracy, precision, recall, f1_score) are required; roc_auc is optional
- Lists (features, unsuitable_inputs, etc.) can be empty but must be arrays
- Risks is optional; if provided, each risk must have all four fields
Common Errors
| Error | Solution |
|---|---|
Invalid JSON |
Check file syntax using jq or a JSON validator |
Invalid YAML |
Check indentation (use spaces, not tabs); use a YAML linter |
Validation failed: accuracy |
Ensure metric values are between 0.0 and 1.0 |
File not found |
Verify the file path and ensure the file exists |
Missing required field |
Check that all required model_* and eval_* fields are present |
Python API Example
Use CLEAR as a library in your code:
from modelcardgen.core.models import (
ModelMetadata,
DatasetMetadata,
EvaluationMetrics,
RiskAssessment,
)
from modelcardgen.reports.markdown import MarkdownCardGenerator
metadata = ModelMetadata(
name="My Classifier",
version="1.0.0",
description="Classifies text documents.",
owner="ML Team",
license="Apache-2.0",
framework="scikit-learn"
)
metrics = EvaluationMetrics(
accuracy=0.92,
precision=0.90,
recall=0.94,
f1_score=0.92,
roc_auc=0.96
)
training_data = DatasetMetadata(
name="Training Set",
description="Internal labeled dataset",
size=10000,
features=["text_features"],
target="label"
)
risks = [
RiskAssessment(
risk_type="Data Distribution Shift",
description="Production data may differ from training distribution.",
mitigation_strategy="Monitor performance metrics in production.",
severity="Medium"
)
]
generator = MarkdownCardGenerator()
generator.generate(
metadata=metadata,
metrics=metrics,
training_data=training_data,
risks=risks,
output_path="MODEL_CARD.md"
)
CI/CD Usage Example
Integrate model card generation into your CI/CD pipeline:
# Example GitHub Actions workflow
name: Generate Model Card
on:
push:
paths:
- 'model/evaluation_results.json'
jobs:
generate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.10'
- run: pip install modelcardgen
- run: |
modelcardgen generate \
--input model/evaluation_results.json \
--output docs/MODEL_CARD.md
- run: git add docs/MODEL_CARD.md && git commit -m "Update model card"
if: ${{ github.event_name == 'push' }}
Design Philosophy
CLEAR follows these principles:
- Offline First: No external API calls or cloud dependencies. Everything runs locally.
- Data Driven: Accuracy depends on the quality of input data. Garbage in, garbage out.
- Template Based: Uses Jinja2 templating for flexibility. Customize output by modifying templates.
- No Magic: Explicit over implicit. The tool documents what you tell it; it doesn't infer or assume.
- Minimal Dependencies: Relies on standard, well-maintained Python libraries (Jinja2, Pydantic, Pandas).
- Language Agnostic: Works with any ML framework or language, as long as you can generate JSON/YAML evaluation output.
License
Licensed under the Apache License 2.0. See LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file modelcardgen-0.1.0.tar.gz.
File metadata
- Download URL: modelcardgen-0.1.0.tar.gz
- Upload date:
- Size: 39.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35c395768cb4c94ee1c1aefcf5f3aa01785cba0e175d62b6993516ad3dd778bc
|
|
| MD5 |
22f33bb04066c01b27254147d3b61109
|
|
| BLAKE2b-256 |
d20949cf1291b7ae571b4977de9c2fc5d973eaa29e1b82e09e13859caaeb7551
|
Provenance
The following attestation bundles were made for modelcardgen-0.1.0.tar.gz:
Publisher:
test-and-publish.yml on ghostcipher1/modelcardgen
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
modelcardgen-0.1.0.tar.gz -
Subject digest:
35c395768cb4c94ee1c1aefcf5f3aa01785cba0e175d62b6993516ad3dd778bc - Sigstore transparency entry: 788373959
- Sigstore integration time:
-
Permalink:
ghostcipher1/modelcardgen@252425fd0df60f489744c7a639fa9cbafa938fc9 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ghostcipher1
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
test-and-publish.yml@252425fd0df60f489744c7a639fa9cbafa938fc9 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file modelcardgen-0.1.0-py3-none-any.whl.
File metadata
- Download URL: modelcardgen-0.1.0-py3-none-any.whl
- Upload date:
- Size: 30.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a222782ac1042c80e4543eca57e725978a658b7c74b69f01e7323e92c332ed5
|
|
| MD5 |
06edaa98d9636d1a0cb97618dc042210
|
|
| BLAKE2b-256 |
cad8194bf9e85300b243188ab4caf6a0129b21c51f2b783f3f0ed11cb8a95ecf
|
Provenance
The following attestation bundles were made for modelcardgen-0.1.0-py3-none-any.whl:
Publisher:
test-and-publish.yml on ghostcipher1/modelcardgen
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
modelcardgen-0.1.0-py3-none-any.whl -
Subject digest:
4a222782ac1042c80e4543eca57e725978a658b7c74b69f01e7323e92c332ed5 - Sigstore transparency entry: 788373960
- Sigstore integration time:
-
Permalink:
ghostcipher1/modelcardgen@252425fd0df60f489744c7a639fa9cbafa938fc9 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ghostcipher1
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
test-and-publish.yml@252425fd0df60f489744c7a639fa9cbafa938fc9 -
Trigger Event:
workflow_dispatch
-
Statement type: