Cigito v3 distill data curator — DeepSeek V4 + Cogito 671B teacher fusion (W2 milestone, $0 CPU)
Project description
Cigito v3 — Distill Data Curator (W2 milestone)
Standalone parallel-track POC. Not wired into Concinno main / Sancio.
Purpose
W2 milestone of the Cigito v3 plan: curate 5 000 (instruction, response, ZIQ trajectory, CBUA stage, FieldRead compression %) pairs from existing Concinno trajectories — at $0 CPU, pure stdlib heuristic, no GPU/API call.
The curated jsonl is the seed corpus for the W3 distill loop (DeepSeek V4 student + Cogito v2.1 671B teacher, IDA composite loss, USPTO patent skeleton — see W1 design doc).
Status
| Phase | Status |
|---|---|
| W1 design doc | shipped 2026-04-28 (_AI_BRAIN/05_Planning/cigito-v3-...) |
| W2 data curation (this pkg, 0.0.1 first publish) | shipped 2026-04-28 |
| W3 distill loop POC | gated on W2; runs on RunPod when triggered |
| W4 GO/NO-GO Gate | gates production wire-in to Concinno main / Sancio runtime |
This W2 milestone is data curation only — it does not run the distill
loop, does not start RunPod, does not invoke any LLM API. The output
data/pairs_v0.jsonl is the input to W3.
PyPI publication note (0.0.1 first publish)
The 0.0.1 release intentionally has zero callers in the Concinno ecosystem. Wiring is gated on the W4 GO/NO-GO Gate per the Plan v3 parallel-track spec, not on this package's PyPI presence.
If you are a third-party adopter:
- Treat 0.0.x as Pre-Alpha — the
Development Status :: 2 - Pre-Alphaclassifier inpyproject.tomlreflects exactly that. Public APIs (curator schema, output jsonl shape, classes undercigito_v3.distill) may change incompatibly between 0.0.x releases as Phase 1 POC findings come back from RunPod runs. - The package is standalone and pure-stdlib by design — it does
not import
concinno,concinno-skills-*, orsancio-runtime. Installing it next to those packages does not affect their behaviour. - A production wire-in (e.g. a
concinno cigito-distillsubcommand or a Concinno main pkgDistillskill) will land only after the W4 GO/NO-GO Gate evaluates the Phase 1 POC. Until then this package is a staged research artefact, not a turn-key feature.
Layout
cigito-v3/
├── pyproject.toml (placeholder, version 0.0.1, AGPL)
├── src/cigito_v3/
│ ├── distill/
│ │ ├── curator.py — 5k pair selector (heuristic, $0 CPU)
│ │ ├── trajectory.py — ZIQ outcome bus replay → decision sequence
│ │ ├── stage_tagger.py — CBUA stage transition labeler
│ │ └── compression.py — FieldRead pre/post compression demo
│ └── data/
│ └── pairs_v0.jsonl — curated output (or pointer)
└── tests/
└── test_distill_v2.py — ≥12 tests
CLI
python -m cigito_v3.distill.curator \
--concinno-home ~/.concinno \
--handoff-root _AI_BRAIN/06_Handoffs \
--target-pairs 5000 \
--output data/pairs_v0.jsonl
The curator is read-only — it never writes outside the explicit
--output path. It refuses to call any external service (asserted in
test_curator_no_external_api_call).
Output schema
Each line of pairs_v0.jsonl is a JSON object:
{
"instruction": "<task framing extracted from the source session>",
"response": "<canonical response derived from the trajectory>",
"ziq_trajectory": [
{"step": 0, "tunable": "...", "value": ..., "reward": 0.93,
"sps_x_ftrl": 0.71, "source": "..."}
],
"cbua_stage": ["C0", "C1", "C2", "B1", "U1", "A1", "A3"],
"fieldread_compression_pct": 0.42,
"source_session": "<session id, anonymized>"
}
Hard constraints
- $0 CPU — pure stdlib + heuristic. No embedding model, no LLM call.
- No personal paths — every input root is parameterized via CLI / env.
- No GPU API call —
test_curator_no_external_api_callasserts the relevant env vars are unset during curation. - Idempotent — same inputs deterministically produce the same output.
- AGPL — per W1 design doc, parallel-track Cigito tracks AGPL like Concinno mainline.
Not a GO/NO-GO Gate
W2 ships a curated pairs jsonl. It is not a Phase 0 PASS milestone. Per the post-exec sediment warning, "Phase 0 PASS" framing is misleading — this milestone is one input to the W3 distill loop, the GO/NO-GO Gate lives at the W4 evaluation stage with held-out test pairs and capability deltas vs the DeepSeek V4 base.
The string Phase 0 PASS is deliberately absent from outputs (regression
test: test_no_phase0_pass_string_in_outputs).
License
AGPL-3.0-only (per W1 design doc — Cigito follows Concinno mainline AGPL).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cigito_v3-0.0.1.tar.gz.
File metadata
- Download URL: cigito_v3-0.0.1.tar.gz
- Upload date:
- Size: 115.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
196065c04a718104a0a8db1eb0f6365d3b1247522685f05d8464126c3644fae0
|
|
| MD5 |
e1497403f7cbab3919b04e41543a6131
|
|
| BLAKE2b-256 |
bc8dccaeced06714f2aecd1c64b0b70ac8e24a589a9c1f2c567bd4837a4db09f
|
File details
Details for the file cigito_v3-0.0.1-py3-none-any.whl.
File metadata
- Download URL: cigito_v3-0.0.1-py3-none-any.whl
- Upload date:
- Size: 116.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b9840282a58d87e60b8152d3984c3ae882abbebda8d2b83b17c528cf0c4bc45
|
|
| MD5 |
30ceacd15ab09a284d88f4d91d8f1ba5
|
|
| BLAKE2b-256 |
b3ed215979c8f21d828ee346cc93bd22c8bde764568985da9d1ebf9ed5f3e312
|