Opinionated platform for agentic workflows, tooling, and evaluation.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

foaadfarooghian

These details have not been verified by PyPI

Project description

SignalForge AI logo

SignalForge AI

Agent engineering and learning system for production workflows
Build runtime traces, evaluate failures, generate datasets, distill specialists, and benchmark trade-offs.

status python license

Project status

SignalForge AI is in active, pre-1.0 development (v0.x).

APIs and schemas are still converging
Breaking changes are expected while core runtime contracts are stabilized
Best fit today: internal platforms, research, and production pilots with pinned versions

Narrowed product scope

SignalForge AI is narrowing to five pillars:

OTel/MCP-native runtime + artifact schema
Evaluation and failure analysis for multi-step, tool-using agents
Dataset generation from production traces
Distillation pipeline for specialist small models
Benchmarking cost, latency, and reliability across models and agent patterns

This is a systems-first direction: execution artifacts and measurable outcomes come before model hype.

Pillars in practice

1) OTel/MCP-native runtime + artifact schema

Runtime events map cleanly to spans/events for distributed observability
MCP tool calls are first-class execution units
Shared artifact contracts for traces, rewards, eval verdicts, and dataset rows

2) Evaluation + failure analysis

Suite-based evaluation for multi-step workflows
Step-level and run-level scoring
Failure taxonomy for tool errors, reasoning failures, recovery failures, and policy failures
Regression diffing across runs, models, and orchestration patterns

3) Dataset generation from production traces

Deterministic trace ETL into SFT, preference, repair, and critique datasets
Provenance from dataset row back to trace/reward artifacts
Data quality checks for schema validity, leakage risk, and label consistency

4) Distillation for specialist small models

Teacher traces -> curated supervision -> student training/eval loops
Focus on narrow specialist capabilities rather than general chat
Reproducible train/eval pipelines for iterative deployment

5) Cost/latency/reliability benchmarking

Comparable benchmark matrix across model providers and orchestration patterns
Explicit trade-off reporting (quality vs cost vs latency vs failure rate)
Reliability metrics for retries, tool success, and degraded-mode completion

Specialist model exchange

SignalForge AI is expanding toward a specialist model registry/exchange where the published unit is a complete deployable package, not only weights.

Each exchange unit includes:

Small domain model
Eval pack
Trace/dataset lineage
Hardware profile
Failure modes
License and usage constraints
Ready-to-run artifacts (adapters, Safetensors/GGUF, Ollama packaging)

Use signalforgeai-exchange to build and validate specialist_model_unit.v0 manifests from training, distillation, and benchmark evidence. See docs/model_exchange.md, docs/specialist_model_unit_sample.json, and docs/specs/specialist_model_unit.schema.json for the contract.

Good early domains:

Nutrition
Auction houses
Document-heavy verticals
Compliance
Support operations
Telecom workflows
Cataloguing
Extraction
Ranking
Summarization

What is already in this repo

Structured JSONL tracing with validation/inspection/diff tooling
Evaluation harnesses and benchmark suites with reward artifacts
Dataset export pipelines (SFT, preferences, repair pairs, curriculum)
Learning/routing infrastructure and experimental SFT/DPO training utilities
Multi-provider model abstraction (OpenAI, Ollama, HF, dummy)

What SignalForge AI is not

A chatbot framework
A prompt library
A no-code builder
A model leaderboard without task context
A fixed set of built-in agents

Quickstart

git clone https://github.com/foaadfarooghian/signalforgeai.git
cd signalforgeai

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Run a reference agent example:

python examples/quickstart_research_agent.py

The quickstart is offline-safe by default and uses the deterministic dummy_good provider. To run against a configured local or hosted model, set SIGNALFORGEAI_MODEL_ID explicitly:

SIGNALFORGEAI_MODEL_ID=ollama:ministral-3:8b python examples/quickstart_research_agent.py
SIGNALFORGEAI_MODEL_ID=openai:gpt-5-mini python examples/quickstart_research_agent.py

Validate and inspect the latest trace:

python -m signalforgeai.logging.validate logs/$(ls -t logs | head -n 1)
python -m signalforgeai.logging.inspect logs/$(ls -t logs | head -n 1)

Run evaluation suites:

python -m signalforgeai.evaluation.run src/signalforgeai/evaluation/suites/quickstart.json
python -m signalforgeai.evaluation.run src/signalforgeai/evaluation/suites/research_quickstart.json

Run the deterministic production-pilot readiness loop:

signalforgeai-pilot-check --mode dummy --work-dir results/pilot_check

This writes:

results/pilot_check/pilot_readiness.md
results/pilot_check/pilot_readiness.json
results/pilot_check/logs/ with trace.v0 and reward.v0 artifacts
results/pilot_check/datasets/manifest.json plus SFT, preference, repair, and curriculum exports

Pilot datasets are strict-gated by default: exported rows include deterministic split metadata, provenance links back to trace/reward artifacts, file hashes, duplicate counts, and leakage checks. Validate an exported dataset directly with:

signalforgeai-dataset-validate results/pilot_check/datasets/pilot.sft.jsonl \
  --kind sft --quality-gate --logs-root results/pilot_check/logs

Run a release regression gate by comparing against the last accepted readiness artifact:

signalforgeai-pilot-check --mode dummy --work-dir results/pilot_current \
  --baseline results/pilot_baseline/pilot_readiness.json

When --baseline is supplied, the command also writes:

results/pilot_current/eval_regression.md
results/pilot_current/eval_regression.json

The default deterministic gate allows no pass-rate drop, no mean-score drop, no new failing cases, and no worse failure-mode movement.

Optional provider smoke checks can be required in configured environments:

signalforgeai-pilot-check --require-provider hosted
signalforgeai-pilot-check --require-provider local

Run the one-command offline release-candidate evidence gate:

signalforgeai-release-candidate-check \
  --work-dir results/release_candidate

This writes release_candidate.v0 JSON and Markdown plus the full child evidence bundle: pilot readiness, mock SFT/DPO training_run.v0, distillation eval, benchmark matrix, specialist unit, package check, consumer smoke run, and registry index.

Release reviewers can require non-mock training evidence. Use an SFT run as the final artifact:

signalforgeai-release-candidate-check \
  --work-dir results/release_candidate \
  --sft-run results/training/sft_training_run.json \
  --final-training-stage sft \
  --require-real-training-evidence

Or use the final DPO run report:

signalforgeai-release-candidate-check \
  --work-dir results/release_candidate \
  --dpo-run results/training/dpo_training_run.json \
  --require-real-training-evidence

Linux release environments with [train] installed can let the release-candidate gate run bounded SFT evidence and evaluate the trained adapter directly:

signalforgeai-release-candidate-check \
  --work-dir results/release_candidate_real \
  --run-training \
  --training-base-model hf/org/base \
  --training-max-steps 1 \
  --require-real-training-evidence

Add --run-dpo when the final candidate should be the DPO adapter. When --candidate-model-id is omitted in this mode, SignalForge AI derives hf:<base>?adapter=<final-adapter-dir> and uses it for distillation and benchmark evidence.

Training remains experimental. In v0.4.0, the [train] extra and actual SFT/DPO execution are Linux-only because the Torch/Triton/Unsloth dependency stack is not portable across all supported core platforms. Preflight evidence is still available without loading models:

signalforgeai-learn train --base-model dummy/base --sft --dpo \
  --sft-data results/pilot_check/datasets/pilot.sft.jsonl \
  --dpo-data results/pilot_check/datasets/pilot.dpo.jsonl \
  --dry-run --quality-gate --logs-root results/pilot_check/logs \
  --report-out results/pilot_check/training_preflight.json

Release environments with [train] dependencies installed can opt into a bounded SFT smoke run and record training_run.v0 evidence:

signalforgeai-learn train --base-model hf/org/base --sft \
  --sft-data results/pilot_check/datasets/pilot.sft.jsonl \
  --sft-out results/training/sft_lora \
  --quality-gate --logs-root results/pilot_check/logs \
  --report-out results/pilot_check/training_preflight.json \
  --run-report-out results/training/sft_training_run.json \
  --smoke --max-steps 1

DPO evidence is opt-in and must point at successful parent SFT run evidence. When a DPO run is present, pass its run report to exchange packaging so the final adapter refs and checksums describe the preference-optimized artifact:

signalforgeai-learn train --base-model hf/org/base --dpo \
  --dpo-data results/pilot_check/datasets/pilot.dpo.jsonl \
  --sft-run results/training/sft_training_run.json \
  --dpo-out results/training/dpo_lora \
  --quality-gate --logs-root results/pilot_check/logs \
  --report-out results/pilot_check/training_preflight.json \
  --run-report-out results/training/dpo_training_run.json \
  --smoke --max-steps 1

To have the pilot loop generate DPO-compatible preference data and attach the preflight summary to readiness output:

signalforgeai-pilot-check --mode dummy --training-preflight \
  --work-dir results/pilot_check

Run the evidence-only distillation eval gate from the generated recipe:

signalforgeai-distill-check \
  --recipe results/pilot_check/distillation_recipe.json \
  --work-dir results/distillation_gate

This compares a candidate specialist model against a baseline/teacher model and writes distillation_eval.v0 JSON and Markdown evidence without requiring real training.

Generate benchmark frontier evidence across suites and model ids:

signalforgeai-benchmark-matrix \
  --config docs/benchmark_matrix_sample.json \
  --work-dir results/benchmark_matrix

This writes benchmark_matrix.v0 JSON and Markdown with task success, cost per successful task, latency p50/p95, reliability fields, mean effective score, and frontier picks. Dummy rows are deterministic and mandatory; hosted/local/HF rows skip unless their provider is explicitly required.

Build and index a local specialist exchange unit from the release evidence:

signalforgeai-exchange build-unit \
  --training-preflight results/pilot_check/training_preflight.json \
  --training-run results/training/dpo_training_run.json \
  --distillation-eval results/distillation_gate/distillation_eval.json \
  --benchmark-matrix results/benchmark_matrix/benchmark_matrix.json \
  --out results/exchange/pilot-specialist.unit.json \
  --id pilot-specialist --name "Pilot Specialist" --version 0.4.0 --domain pilot \
  --model-family dummy --model-size 0B --model-format safetensors \
  --model-license Apache-2.0 --dataset-license CC-BY-4.0 \
  --usage-constraint "not for production decisions without review" \
  --failure-mode dummy_only \
  --failure-description "Dummy artifacts only prove exchange plumbing." \
  --failure-mitigation "Replace dummy refs before release." \
  --safetensors-ref hf://signalforgeai/pilot-specialist/model.safetensors \
  --ollama-modelfile hf://signalforgeai/pilot-specialist/Modelfile \
  --ollama-tag signalforgeai/pilot-specialist:0.4.0

signalforgeai-exchange package-check \
  --manifest results/exchange/pilot-specialist.unit.json \
  --out results/exchange/specialist_package.json \
  --package-type auto --release-ready --update-manifest

signalforgeai-exchange smoke-run \
  --manifest results/exchange/pilot-specialist.unit.json \
  --work-dir results/exchange/smoke --update-manifest

signalforgeai-exchange validate \
  --manifest results/exchange/pilot-specialist.unit.json --release-ready

signalforgeai-exchange index \
  --registry-dir results/exchange --out results/exchange/index.json --release-ready

Repository layout

src/signalforgeai/
├── agents/          # Reference agents used by eval suites
├── orchestration/   # Multi-step execution patterns
├── logging/         # Trace schema, emitter, validation, inspection
├── evaluation/      # Suites, harness, scoring, reporting
├── export/          # Trace -> dataset transformations
├── learning/        # Routing and learning loop primitives
└── training/        # Experimental SFT/DPO components

Top-level runtime assets:

logs/ -> execution traces and reward artifacts
datasets/ -> generated learning datasets
results/ -> evaluation outputs and summaries

See manifesto.md for principles and roadmap.md for the focused build plan.

Branching & releases

prod -> protected, tagged releases
dev -> integration branch for ongoing work

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

foaadfarooghian

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.0

May 25, 2026

This version

0.4.0

May 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signalforgeai-0.4.0.tar.gz (168.8 kB view details)

Uploaded May 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

signalforgeai-0.4.0-py3-none-any.whl (178.4 kB view details)

Uploaded May 25, 2026 Python 3

File details

Details for the file signalforgeai-0.4.0.tar.gz.

File metadata

Download URL: signalforgeai-0.4.0.tar.gz
Upload date: May 25, 2026
Size: 168.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for signalforgeai-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`f5b119557b8bbcbaec226223660d7f017aee5d00d6bf739349c1bee287468200`
MD5	`6d126e8f94927e4e3ba97d7570b53a07`
BLAKE2b-256	`5204097842678d98d95994b0cd1cf3b7d604829b5a6bfb6936f57985d237b62e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for signalforgeai-0.4.0.tar.gz:

Publisher: release.yml on foaadfarooghian/signalforgeai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: signalforgeai-0.4.0.tar.gz
- Subject digest: f5b119557b8bbcbaec226223660d7f017aee5d00d6bf739349c1bee287468200
- Sigstore transparency entry: 1629052549
- Sigstore integration time: May 25, 2026
Source repository:
- Permalink: foaadfarooghian/signalforgeai@27b1671d73373088e8077572de09de8d0b4a90dd
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/foaadfarooghian
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@27b1671d73373088e8077572de09de8d0b4a90dd
- Trigger Event: workflow_dispatch

File details

Details for the file signalforgeai-0.4.0-py3-none-any.whl.

File metadata

Download URL: signalforgeai-0.4.0-py3-none-any.whl
Upload date: May 25, 2026
Size: 178.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for signalforgeai-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b4da9537798e22384f65728ba5e67c6d50af8a688c03d285ff7154dbff181498`
MD5	`78afe2113695690b822c74739fcb32fe`
BLAKE2b-256	`afa2a0b649886f801122cfd2ec2c055c30b0f192a69313afc01cadbf4ec59fc6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for signalforgeai-0.4.0-py3-none-any.whl:

Publisher: release.yml on foaadfarooghian/signalforgeai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: signalforgeai-0.4.0-py3-none-any.whl
- Subject digest: b4da9537798e22384f65728ba5e67c6d50af8a688c03d285ff7154dbff181498
- Sigstore transparency entry: 1629052567
- Sigstore integration time: May 25, 2026
Source repository:
- Permalink: foaadfarooghian/signalforgeai@27b1671d73373088e8077572de09de8d0b4a90dd
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/foaadfarooghian
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@27b1671d73373088e8077572de09de8d0b4a90dd
- Trigger Event: workflow_dispatch

signalforgeai 0.4.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

SignalForge AI

Project status

Narrowed product scope

Pillars in practice

1) OTel/MCP-native runtime + artifact schema

2) Evaluation + failure analysis

3) Dataset generation from production traces

4) Distillation for specialist small models

5) Cost/latency/reliability benchmarking

Specialist model exchange

What is already in this repo

What SignalForge AI is not

Quickstart

Repository layout

Branching & releases

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance