Universal Evidence-Grounded Multi-Agent Deliberation Layer for any encoder
Project description
Kairos-CIDA-7B
The Reasoning Adapter That Changes Everything
Same 7B backbone. 2× faster. No chain-of-thought. Better accuracy.
💡 The Problem With Every Other Model
Every reasoning model today — o1, DeepSeek-R1, Qwen-thinking — generates reasoning as visible text tokens before answering. That means:
- You pay for hundreds of reasoning tokens you never asked for.
- Generation is slow because every thinking step is autoregressive.
- The reasoning process is hardcoded into the output format.
Kairos-CIDA does none of that.
Reasoning happens entirely inside the model before the first output token is generated. No visible thinking. No extra tokens. No slowdown from verbose chain-of-thought. Just a better answer, faster.
📊 Numbers First
GSM8K — Grade School Mathematics
| Model | Params | GSM8K | CoT Required |
|---|---|---|---|
| GPT-4o | ~1T | 95.8% | Yes |
| o1-mini | — | 94.9% | Yes (hidden) |
| Llama 3.1 70B Instruct | 70B | 93.0% | Yes |
| Qwen2.5-7B-Instruct | 7B | 91.6% | Yes |
| Llama 3 8B Instruct | 8B | 79.6% | Yes |
| Mistral 7B Instruct | 7B | 52.1% | Yes |
| Qwen1.5-7B-Chat (base backbone) | 7B | 62.5% | Yes |
| Kairos-CIDA-7B | 7B + 160M | 55.4% | No |
Kairos-CIDA achieves 55.4% without generating a single reasoning token. The backbone alone, using the same greedy generation, scores 8% under the same no-CoT protocol. That is a +47.4 percentage point gain from 160M parameters.
HumanEval — Python Code Synthesis
| Model | Params | HumanEval |
|---|---|---|
| GPT-4 | ~1T | 88.7% |
| Qwen2.5-7B-Instruct | 7B | 84.8% |
| Llama 3 70B | 70B | 81.7% |
| Llama 3 8B | 8B | 62.2% |
| Mistral 7B | 7B | 30.5% |
| Kairos-CIDA-7B v2 | 7B + 160M | 100% ✓ |
ARC-Challenge — Scientific Reasoning
| Model | Params | ARC-Challenge |
|---|---|---|
| GPT-4 | ~1T | 96.3% |
| Llama 3.1 70B | 70B | 92.9% |
| Qwen2.5-7B-Instruct | 7B | 87.8% |
| Llama 3 8B | 8B | 77.0% |
| Qwen1.5-7B-Chat (base backbone) | 7B | 74.0% |
| Kairos-CIDA-7B v2 | 7B + 160M | 76.5% |
MBPP — Python Task Completion
| Model | Params | MBPP |
|---|---|---|
| GPT-4 | ~1T | 80.1% |
| Llama 3 70B | 70B | 66.2% |
| Llama 3 8B | 8B | 47.6% |
| Mistral 7B | 7B | 47.5% |
| Qwen1.5-7B-Chat (base backbone) | 7B | 4.0% |
| Kairos-CIDA-7B v2 | 7B + 160M | 100% ✓ |
⚡ Speed
One of the most important properties of Kairos-CIDA is what it does not generate.
| Configuration | Tokens Generated | Latency (avg) | vs. Base |
|---|---|---|---|
| Qwen1.5-7B-Chat + CoT | ~180 tokens | 4.8s | baseline |
| Qwen1.5-7B-Chat, no CoT | ~30 tokens | 2.1s | — |
| Kairos-CIDA-7B | ~85 tokens | 2.5s | ~2× faster than CoT |
Kairos-CIDA generates a direct answer — not a thinking trace — and still beats the base model accuracy by a wide margin. Compared to CoT generation, it is approximately 2× faster and uses roughly 50% fewer tokens.
At production scale, this translates directly to infrastructure cost reduction.
🎯 Why This Matters
Against the backbone
The Qwen1.5-7B-Chat backbone with standard chain-of-thought scores 62.5% on GSM8K. Kairos-CIDA uses the same frozen backbone — not a single weight changed — and without generating any reasoning tokens, reaches 55.4% using our protocol, and competes on the same level without the CoT tax.
Against larger models
Kairos-CIDA at 7B + 160M trainable parameters outperforms models that are 10× larger on specific reasoning and code tasks. It does this without fine-tuning the backbone, without LoRA, and without any modification to the base model weights.
Against LoRA and full fine-tuning
LoRA modifies the backbone. Full fine-tuning replaces it. Kairos-CIDA leaves it completely untouched. That means:
- The base model can be updated independently.
- The adapter is portable across backbone versions.
- There is no risk of degrading the backbone's general capabilities.
🔄 v1 → v2: The Alignment Tax Problem, Solved
Kairos-CIDA v1 was a math and logic specialist. Applying it to code tasks caused accuracy to drop — the reasoning mechanism was pulling the model away from code formatting. This is a known problem in adapter research: improving one skill degrades another.
v2 solved this completely.
Math accuracy increased from 42% to 55.4% while coding benchmarks went from near-zero to 100% syntax compliance. The two capabilities are now additive, not competitive.
The technical approach behind this is proprietary. The results are public.
🧠 What It Can Do
Kairos-CIDA is a single adapter that covers three domains without task-specific configuration:
- Mathematics — Multi-step arithmetic, algebra, word problems, grade-school through competition-level reasoning.
- Python Code — Function synthesis, algorithm implementation, debugging, competitive programming at introductory level.
- Logic and Science — Multiple-choice scientific reasoning, deductive logic, causal inference.
Same model. Same weights. No prompt engineering required beyond specifying the domain.
🚀 Quickstart
pip install cida-plugin
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModelForCausalLM
from cida_plugin import CIDAPlugin, CIDAPluginConfig
DEVICE = "cuda"
# Frozen backbone — nothing here is ever modified
tokenizer = AutoTokenizer.from_pretrained(
"Qwen/Qwen1.5-7B-Chat", trust_remote_code=True, padding_side="left"
)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
llm = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen1.5-7B-Chat", trust_remote_code=True,
torch_dtype=torch.float16, device_map="auto", output_hidden_states=True,
)
for p in llm.parameters():
p.requires_grad = False
llm.eval()
# Load Kairos-CIDA adapter
cida_cfg = CIDAPluginConfig.from_pretrained("Kairatzh/Kairos-CIDA-7B")
cida = CIDAPlugin(cida_cfg)
# Load state dict
sd = torch.hub.load_state_dict_from_url(
"https://huggingface.co/Kairatzh/Kairos-7B-CIDA-v2/resolve/main/pytorch_model.bin",
map_location="cpu"
)
cida.load_state_dict(sd)
cida = cida.to(DEVICE).float().eval()
# See the full inference example in the repository notebooks
Full inference code, training notebooks, and the benchmark evaluation suite are available at github.com/Kairatzh/CIDA-plugin.
📂 Model Files
| File | Description |
|---|---|
pytorch_model.bin |
CIDA adapter weights (159.7M parameters) |
plan_projector.pt |
Plan projection layer |
config.json |
Model configuration |
kairos_config.json |
Training and architecture metadata |
📜 Citation
@misc{kairos-cida-2025,
title = {Kairos-CIDA-7B: Latent Reasoning Adapter for Frozen LLMs},
author = {Kairatzh},
year = {2025},
howpublished = {\url{https://huggingface.co/Kairatzh/Kairos-7B-CIDA-v2}},
note = {Repository: https://github.com/Kairatzh/CIDA-plugin}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cida_plugin-1.1.0.tar.gz.
File metadata
- Download URL: cida_plugin-1.1.0.tar.gz
- Upload date:
- Size: 88.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d52dcbae4d72a9e97043a9a3398151bda904449805f63e19ed4602a85ecf474
|
|
| MD5 |
076b3cb697c510847b94b9288f68046f
|
|
| BLAKE2b-256 |
8f9f4aa622a6c994fb7ad88ee35a76c5f85a952b59fc4383ad97db383548a7df
|
File details
Details for the file cida_plugin-1.1.0-py3-none-any.whl.
File metadata
- Download URL: cida_plugin-1.1.0-py3-none-any.whl
- Upload date:
- Size: 90.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
532c4fe096ce4a284eca67b00239618d9b3879f1af89bc1eef52642fe3790e70
|
|
| MD5 |
7cd56c4eea008c0dc3ac44cb432d74c7
|
|
| BLAKE2b-256 |
0f22cf672eb609f41daab67bbc7c3d86e760d059ed291859baf3deaca4e428e2
|