Skip to main content

Opinionated platform for agentic workflows, tooling, and evaluation.

Project description

SignalForge AI logo

SignalForge AI

Agent engineering and learning system for production workflows
Build runtime traces, evaluate failures, generate datasets, distill specialists, and benchmark trade-offs.

status python license ci pypi


Project status

SignalForge AI is in active, pre-1.0 development (v0.x).

  • APIs and schemas are still converging
  • Breaking changes are expected while core runtime contracts are stabilized
  • Best fit today: internal platforms, research, and production pilots with pinned versions

See docs/public_contracts.md for the current public surface and experimental boundaries.


Narrowed product scope

SignalForge AI is narrowing to five pillars:

  1. OTel/MCP-native runtime + artifact schema
  2. Evaluation and failure analysis for multi-step, tool-using agents
  3. Dataset generation from production traces
  4. Distillation pipeline for specialist small models
  5. Benchmarking cost, latency, and reliability across models and agent patterns

This is a systems-first direction: execution artifacts and measurable outcomes come before model hype.


Pillars in practice

1) OTel/MCP-native runtime + artifact schema

  • Runtime events map cleanly to spans/events for distributed observability
  • MCP tool calls are first-class execution units
  • Shared artifact contracts for traces, rewards, eval verdicts, and dataset rows

2) Evaluation + failure analysis

  • Suite-based evaluation for multi-step workflows
  • Step-level and run-level scoring
  • Failure taxonomy for tool errors, reasoning failures, recovery failures, and policy failures
  • Regression diffing across runs, models, and orchestration patterns

3) Dataset generation from production traces

  • Deterministic trace ETL into SFT, preference, repair, and critique datasets
  • Provenance from dataset row back to trace/reward artifacts
  • Data quality checks for schema validity, leakage risk, and label consistency

4) Distillation for specialist small models

  • Teacher traces -> curated supervision -> student training/eval loops
  • Focus on narrow specialist capabilities rather than general chat
  • Reproducible train/eval pipelines for iterative deployment

5) Cost/latency/reliability benchmarking

  • Comparable benchmark matrix across model providers and orchestration patterns
  • Explicit trade-off reporting (quality vs cost vs latency vs failure rate)
  • Reliability metrics for retries, tool success, and degraded-mode completion

Specialist model exchange

SignalForge AI is expanding toward a specialist model registry/exchange where the published unit is a complete deployable package, not only weights.

Each exchange unit includes:

  • Small domain model
  • Eval pack
  • Trace/dataset lineage
  • Hardware profile
  • Failure modes
  • License and usage constraints
  • Ready-to-run artifacts (adapters, Safetensors/GGUF, Ollama packaging)

Use signalforgeai-exchange to build and validate specialist_model_unit.v0 manifests from training, distillation, and benchmark evidence. See docs/model_exchange.md, docs/specialist_model_unit_sample.json, and docs/specs/specialist_model_unit.schema.json for the contract.

Good early domains:

  • Nutrition
  • Auction houses
  • Document-heavy verticals
  • Compliance
  • Support operations
  • Telecom workflows
  • Cataloguing
  • Extraction
  • Ranking
  • Summarization

What is already in this repo

  • Structured JSONL tracing with validation/inspection/diff tooling
  • Evaluation harnesses and benchmark suites with reward artifacts
  • Dataset export pipelines (SFT, preferences, repair pairs, curriculum)
  • Learning/routing infrastructure and experimental SFT/DPO training utilities
  • Multi-provider model abstraction (OpenAI, Ollama, HF, dummy)

What SignalForge AI is not

  • A chatbot framework
  • A prompt library
  • A no-code builder
  • A model leaderboard without task context
  • A fixed set of built-in agents

Quickstart

python -m venv .venv
source .venv/bin/activate
pip install signalforgeai==0.5.0

Run the deterministic production-pilot readiness loop:

signalforgeai-pilot-check --mode dummy --work-dir results/pilot_check

This quickstart is offline-safe by default and uses the deterministic dummy_good provider. It writes:

  • results/pilot_check/pilot_readiness.md
  • results/pilot_check/pilot_readiness.json
  • results/pilot_check/logs/ with trace.v0 and reward.v0 artifacts
  • results/pilot_check/datasets/manifest.json plus SFT, preference, repair, and curriculum exports

For a guided first-user path from install through artifact inspection, see docs/first_pilot.md. Provider-backed runs are opt-in; see docs/provider_setup.md for OpenAI, Ollama, and HF setup.

Validate and inspect the latest trace:

python -m signalforgeai.logging.validate results/pilot_check/logs/$(ls -t results/pilot_check/logs | head -n 1)
python -m signalforgeai.logging.inspect results/pilot_check/logs/$(ls -t results/pilot_check/logs | head -n 1)

To run the repository examples, clone the source checkout and install editable:

git clone https://github.com/foaadfarooghian/signalforgeai.git
cd signalforgeai
pip install -e ".[dev]"
python examples/quickstart_research_agent.py

Run evaluation suites:

python -m signalforgeai.evaluation.run src/signalforgeai/evaluation/suites/quickstart.json
python -m signalforgeai.evaluation.run src/signalforgeai/evaluation/suites/research_quickstart.json

To run against a configured local or hosted model, set SIGNALFORGEAI_MODEL_ID explicitly:

SIGNALFORGEAI_MODEL_ID=ollama:ministral-3:8b python examples/quickstart_research_agent.py
SIGNALFORGEAI_MODEL_ID=openai:gpt-5-mini python examples/quickstart_research_agent.py

Pilot datasets are strict-gated by default: exported rows include deterministic split metadata, provenance links back to trace/reward artifacts, file hashes, duplicate counts, and leakage checks. Validate an exported dataset directly with:

signalforgeai-dataset-validate results/pilot_check/datasets/pilot.sft.jsonl \
  --kind sft --quality-gate --logs-root results/pilot_check/logs

Run a release regression gate by comparing against the last accepted readiness artifact:

signalforgeai-pilot-check --mode dummy --work-dir results/pilot_current \
  --baseline results/pilot_baseline/pilot_readiness.json

When --baseline is supplied, the command also writes:

  • results/pilot_current/eval_regression.md
  • results/pilot_current/eval_regression.json

The default deterministic gate allows no pass-rate drop, no mean-score drop, no new failing cases, and no worse failure-mode movement.

Optional provider smoke checks can be required in configured environments:

signalforgeai-pilot-check --require-provider hosted
signalforgeai-pilot-check --require-provider local

Run the one-command offline release-candidate evidence gate:

signalforgeai-release-candidate-check \
  --work-dir results/release_candidate

This writes release_candidate.v0 JSON and Markdown plus the full child evidence bundle: pilot readiness, mock SFT/DPO training_run.v0, distillation eval, benchmark matrix, specialist unit, package check, consumer smoke run, and registry index.

Release reviewers can require non-mock training evidence. Use an SFT run as the final artifact:

signalforgeai-release-candidate-check \
  --work-dir results/release_candidate \
  --sft-run results/training/sft_training_run.json \
  --final-training-stage sft \
  --require-real-training-evidence

Or use the final DPO run report:

signalforgeai-release-candidate-check \
  --work-dir results/release_candidate \
  --dpo-run results/training/dpo_training_run.json \
  --require-real-training-evidence

Linux release environments with [train] installed can let the release-candidate gate run bounded SFT evidence and evaluate the trained adapter directly:

signalforgeai-release-candidate-check \
  --work-dir results/release_candidate_real \
  --run-training \
  --training-base-model hf/org/base \
  --training-max-steps 1 \
  --require-real-training-evidence

Add --run-dpo when the final candidate should be the DPO adapter. When --candidate-model-id is omitted in this mode, SignalForge AI derives hf:<base>?adapter=<final-adapter-dir> and uses it for distillation and benchmark evidence.

Real SFT/DPO evidence is not required for the v0.5.0 onboarding gate. See docs/post_0_5_training_evidence.md for the post-0.5 scope.

Training remains experimental. In v0.5.0, the [train] extra and actual SFT/DPO execution are Linux-only because the Torch/Triton/Unsloth dependency stack is not portable across all supported core platforms. Preflight evidence is still available without loading models:

signalforgeai-learn train --base-model dummy/base --sft --dpo \
  --sft-data results/pilot_check/datasets/pilot.sft.jsonl \
  --dpo-data results/pilot_check/datasets/pilot.dpo.jsonl \
  --dry-run --quality-gate --logs-root results/pilot_check/logs \
  --report-out results/pilot_check/training_preflight.json

Release environments with [train] dependencies installed can opt into a bounded SFT smoke run and record training_run.v0 evidence:

signalforgeai-learn train --base-model hf/org/base --sft \
  --sft-data results/pilot_check/datasets/pilot.sft.jsonl \
  --sft-out results/training/sft_lora \
  --quality-gate --logs-root results/pilot_check/logs \
  --report-out results/pilot_check/training_preflight.json \
  --run-report-out results/training/sft_training_run.json \
  --smoke --max-steps 1

DPO evidence is opt-in and must point at successful parent SFT run evidence. When a DPO run is present, pass its run report to exchange packaging so the final adapter refs and checksums describe the preference-optimized artifact:

signalforgeai-learn train --base-model hf/org/base --dpo \
  --dpo-data results/pilot_check/datasets/pilot.dpo.jsonl \
  --sft-run results/training/sft_training_run.json \
  --dpo-out results/training/dpo_lora \
  --quality-gate --logs-root results/pilot_check/logs \
  --report-out results/pilot_check/training_preflight.json \
  --run-report-out results/training/dpo_training_run.json \
  --smoke --max-steps 1

To have the pilot loop generate DPO-compatible preference data and attach the preflight summary to readiness output:

signalforgeai-pilot-check --mode dummy --training-preflight \
  --work-dir results/pilot_check

Run the evidence-only distillation eval gate from the generated recipe:

signalforgeai-distill-check \
  --recipe results/pilot_check/distillation_recipe.json \
  --work-dir results/distillation_gate

This compares a candidate specialist model against a baseline/teacher model and writes distillation_eval.v0 JSON and Markdown evidence without requiring real training.

Generate benchmark frontier evidence across suites and model ids:

signalforgeai-benchmark-matrix \
  --config docs/benchmark_matrix_sample.json \
  --work-dir results/benchmark_matrix

This writes benchmark_matrix.v0 JSON and Markdown with task success, cost per successful task, latency p50/p95, reliability fields, mean effective score, and frontier picks. Dummy rows are deterministic and mandatory; hosted/local/HF rows skip unless their provider is explicitly required.

Build and index a local specialist exchange unit from the release evidence:

signalforgeai-exchange build-unit \
  --training-preflight results/pilot_check/training_preflight.json \
  --training-run results/training/dpo_training_run.json \
  --distillation-eval results/distillation_gate/distillation_eval.json \
  --benchmark-matrix results/benchmark_matrix/benchmark_matrix.json \
  --out results/exchange/pilot-specialist.unit.json \
  --id pilot-specialist --name "Pilot Specialist" --version 0.5.0 --domain pilot \
  --model-family dummy --model-size 0B --model-format safetensors \
  --model-license Apache-2.0 --dataset-license CC-BY-4.0 \
  --usage-constraint "not for production decisions without review" \
  --failure-mode dummy_only \
  --failure-description "Dummy artifacts only prove exchange plumbing." \
  --failure-mitigation "Replace dummy refs before release." \
  --safetensors-ref hf://signalforgeai/pilot-specialist/model.safetensors \
  --ollama-modelfile hf://signalforgeai/pilot-specialist/Modelfile \
  --ollama-tag signalforgeai/pilot-specialist:0.5.0

signalforgeai-exchange package-check \
  --manifest results/exchange/pilot-specialist.unit.json \
  --out results/exchange/specialist_package.json \
  --package-type auto --release-ready --update-manifest

signalforgeai-exchange smoke-run \
  --manifest results/exchange/pilot-specialist.unit.json \
  --work-dir results/exchange/smoke --update-manifest

signalforgeai-exchange validate \
  --manifest results/exchange/pilot-specialist.unit.json --release-ready

signalforgeai-exchange index \
  --registry-dir results/exchange --out results/exchange/index.json --release-ready

Repository layout

src/signalforgeai/
├── agents/          # Reference agents used by eval suites
├── orchestration/   # Multi-step execution patterns
├── logging/         # Trace schema, emitter, validation, inspection
├── evaluation/      # Suites, harness, scoring, reporting
├── export/          # Trace -> dataset transformations
├── learning/        # Routing and learning loop primitives
└── training/        # Experimental SFT/DPO components

Top-level runtime assets:

  • logs/ -> execution traces and reward artifacts
  • datasets/ -> generated learning datasets
  • results/ -> evaluation outputs and summaries

See manifesto.md for principles and roadmap.md for the focused build plan.


Branching & releases

  • prod -> protected, tagged releases
  • dev -> integration branch for ongoing work
  • docs/release_checklist.md -> release gate for v0.5.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signalforgeai-0.5.0.tar.gz (171.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

signalforgeai-0.5.0-py3-none-any.whl (180.4 kB view details)

Uploaded Python 3

File details

Details for the file signalforgeai-0.5.0.tar.gz.

File metadata

  • Download URL: signalforgeai-0.5.0.tar.gz
  • Upload date:
  • Size: 171.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for signalforgeai-0.5.0.tar.gz
Algorithm Hash digest
SHA256 20946d8de0650e64072109ef5c52e0fdcc3e168c646d6b67a2172457b9abd560
MD5 836afd804abc33825115670688257f31
BLAKE2b-256 0bcf081ab5af1fbedf8e85eb886aed902b5cc5944501d40cd2af1803f6d62e24

See more details on using hashes here.

Provenance

The following attestation bundles were made for signalforgeai-0.5.0.tar.gz:

Publisher: release.yml on foaadfarooghian/signalforgeai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file signalforgeai-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: signalforgeai-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 180.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for signalforgeai-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2e49a195c6f5eb6cfb20918277a20a979b3d0090fbf80ef0579bbc9e8fda7365
MD5 98bd2048e597fbe60a36c239a7aeda62
BLAKE2b-256 cb1a2f75d451bd29b159387635af5cda6109bb0c6b68617443c2545c1e464a2f

See more details on using hashes here.

Provenance

The following attestation bundles were made for signalforgeai-0.5.0-py3-none-any.whl:

Publisher: release.yml on foaadfarooghian/signalforgeai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page