Skip to main content

Cigito v3 distill data curator — DeepSeek V4 + Cogito 671B teacher fusion (W2 milestone, $0 CPU)

Project description

Cigito v3 — Distill Data Curator (W2 milestone)

Standalone parallel-track POC. Not wired into Concinno main / Sancio.

Purpose

W2 milestone of the Cigito v3 plan: curate 5 000 (instruction, response, ZIQ trajectory, CBUA stage, FieldRead compression %) pairs from existing Concinno trajectories — at $0 CPU, pure stdlib heuristic, no GPU/API call.

The curated jsonl is the seed corpus for the W3 distill loop (DeepSeek V4 student + Cogito v2.1 671B teacher, IDA composite loss, USPTO patent skeleton — see W1 design doc).

Status

Phase Status
W1 design doc shipped 2026-04-28 (_AI_BRAIN/05_Planning/cigito-v3-...)
W2 data curation (this pkg, 0.0.1 first publish) shipped 2026-04-28
W3 distill loop POC gated on W2; runs on RunPod when triggered
W4 GO/NO-GO Gate gates production wire-in to Concinno main / Sancio runtime

This W2 milestone is data curation only — it does not run the distill loop, does not start RunPod, does not invoke any LLM API. The output data/pairs_v0.jsonl is the input to W3.

PyPI publication note (0.0.1 first publish)

The 0.0.1 release intentionally has zero callers in the Concinno ecosystem. Wiring is gated on the W4 GO/NO-GO Gate per the Plan v3 parallel-track spec, not on this package's PyPI presence.

If you are a third-party adopter:

  • Treat 0.0.x as Pre-Alpha — the Development Status :: 2 - Pre-Alpha classifier in pyproject.toml reflects exactly that. Public APIs (curator schema, output jsonl shape, classes under cigito_v3.distill) may change incompatibly between 0.0.x releases as Phase 1 POC findings come back from RunPod runs.
  • The package is standalone and pure-stdlib by design — it does not import concinno, concinno-skills-*, or sancio-runtime. Installing it next to those packages does not affect their behaviour.
  • A production wire-in (e.g. a concinno cigito-distill subcommand or a Concinno main pkg Distill skill) will land only after the W4 GO/NO-GO Gate evaluates the Phase 1 POC. Until then this package is a staged research artefact, not a turn-key feature.

Layout

cigito-v3/
├── pyproject.toml          (placeholder, version 0.0.1, AGPL)
├── src/cigito_v3/
│   ├── distill/
│   │   ├── curator.py        — 5k pair selector (heuristic, $0 CPU)
│   │   ├── trajectory.py     — ZIQ outcome bus replay → decision sequence
│   │   ├── stage_tagger.py   — CBUA stage transition labeler
│   │   └── compression.py    — FieldRead pre/post compression demo
│   └── data/
│       └── pairs_v0.jsonl    — curated output (or pointer)
└── tests/
    └── test_distill_v2.py    — ≥12 tests

CLI

python -m cigito_v3.distill.curator \
    --concinno-home ~/.concinno \
    --handoff-root _AI_BRAIN/06_Handoffs \
    --target-pairs 5000 \
    --output data/pairs_v0.jsonl

The curator is read-only — it never writes outside the explicit --output path. It refuses to call any external service (asserted in test_curator_no_external_api_call).

Output schema

Each line of pairs_v0.jsonl is a JSON object:

{
  "instruction": "<task framing extracted from the source session>",
  "response": "<canonical response derived from the trajectory>",
  "ziq_trajectory": [
    {"step": 0, "tunable": "...", "value": ..., "reward": 0.93,
     "sps_x_ftrl": 0.71, "source": "..."}
  ],
  "cbua_stage": ["C0", "C1", "C2", "B1", "U1", "A1", "A3"],
  "fieldread_compression_pct": 0.42,
  "source_session": "<session id, anonymized>"
}

Hard constraints

  1. $0 CPU — pure stdlib + heuristic. No embedding model, no LLM call.
  2. No personal paths — every input root is parameterized via CLI / env.
  3. No GPU API calltest_curator_no_external_api_call asserts the relevant env vars are unset during curation.
  4. Idempotent — same inputs deterministically produce the same output.
  5. AGPL — per W1 design doc, parallel-track Cigito tracks AGPL like Concinno mainline.

Not a GO/NO-GO Gate

W2 ships a curated pairs jsonl. It is not a Phase 0 PASS milestone. Per the post-exec sediment warning, "Phase 0 PASS" framing is misleading — this milestone is one input to the W3 distill loop, the GO/NO-GO Gate lives at the W4 evaluation stage with held-out test pairs and capability deltas vs the DeepSeek V4 base.

The string Phase 0 PASS is deliberately absent from outputs (regression test: test_no_phase0_pass_string_in_outputs).

License

AGPL-3.0-only (per W1 design doc — Cigito follows Concinno mainline AGPL).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cigito_v3-0.0.1.tar.gz (115.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cigito_v3-0.0.1-py3-none-any.whl (116.5 kB view details)

Uploaded Python 3

File details

Details for the file cigito_v3-0.0.1.tar.gz.

File metadata

  • Download URL: cigito_v3-0.0.1.tar.gz
  • Upload date:
  • Size: 115.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for cigito_v3-0.0.1.tar.gz
Algorithm Hash digest
SHA256 196065c04a718104a0a8db1eb0f6365d3b1247522685f05d8464126c3644fae0
MD5 e1497403f7cbab3919b04e41543a6131
BLAKE2b-256 bc8dccaeced06714f2aecd1c64b0b70ac8e24a589a9c1f2c567bd4837a4db09f

See more details on using hashes here.

File details

Details for the file cigito_v3-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: cigito_v3-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 116.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for cigito_v3-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6b9840282a58d87e60b8152d3984c3ae882abbebda8d2b83b17c528cf0c4bc45
MD5 30ceacd15ab09a284d88f4d91d8f1ba5
BLAKE2b-256 b3ed215979c8f21d828ee346cc93bd22c8bde764568985da9d1ebf9ed5f3e312

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page