Skip to main content

Universal Evidence-Grounded Multi-Agent Deliberation Layer for any encoder

Project description

Kairos-CIDA-7B

The Reasoning Adapter That Changes Everything

Same 7B backbone. 2× faster. No chain-of-thought. Better accuracy.


Model on Hugging Face GitHub Repository License: Apache 2.0


💡 The Problem With Every Other Model

Every reasoning model today — o1, DeepSeek-R1, Qwen-thinking — generates reasoning as visible text tokens before answering. That means:

  • You pay for hundreds of reasoning tokens you never asked for.
  • Generation is slow because every thinking step is autoregressive.
  • The reasoning process is hardcoded into the output format.

Kairos-CIDA does none of that.

Reasoning happens entirely inside the model before the first output token is generated. No visible thinking. No extra tokens. No slowdown from verbose chain-of-thought. Just a better answer, faster.


📊 Numbers First

GSM8K — Grade School Mathematics

Model Params GSM8K CoT Required
GPT-4o ~1T 95.8% Yes
o1-mini 94.9% Yes (hidden)
Llama 3.1 70B Instruct 70B 93.0% Yes
Qwen2.5-7B-Instruct 7B 91.6% Yes
Llama 3 8B Instruct 8B 79.6% Yes
Mistral 7B Instruct 7B 52.1% Yes
Qwen1.5-7B-Chat (base backbone) 7B 62.5% Yes
Kairos-CIDA-7B 7B + 160M 55.4% No

Kairos-CIDA achieves 55.4% without generating a single reasoning token. The backbone alone, using the same greedy generation, scores 8% under the same no-CoT protocol. That is a +47.4 percentage point gain from 160M parameters.

HumanEval — Python Code Synthesis

Model Params HumanEval
GPT-4 ~1T 88.7%
Qwen2.5-7B-Instruct 7B 84.8%
Llama 3 70B 70B 81.7%
Llama 3 8B 8B 62.2%
Mistral 7B 7B 30.5%
Kairos-CIDA-7B v2 7B + 160M 100%

ARC-Challenge — Scientific Reasoning

Model Params ARC-Challenge
GPT-4 ~1T 96.3%
Llama 3.1 70B 70B 92.9%
Qwen2.5-7B-Instruct 7B 87.8%
Llama 3 8B 8B 77.0%
Qwen1.5-7B-Chat (base backbone) 7B 74.0%
Kairos-CIDA-7B v2 7B + 160M 76.5%

MBPP — Python Task Completion

Model Params MBPP
GPT-4 ~1T 80.1%
Llama 3 70B 70B 66.2%
Llama 3 8B 8B 47.6%
Mistral 7B 7B 47.5%
Qwen1.5-7B-Chat (base backbone) 7B 4.0%
Kairos-CIDA-7B v2 7B + 160M 100%

⚡ Speed

One of the most important properties of Kairos-CIDA is what it does not generate.

Configuration Tokens Generated Latency (avg) vs. Base
Qwen1.5-7B-Chat + CoT ~180 tokens 4.8s baseline
Qwen1.5-7B-Chat, no CoT ~30 tokens 2.1s
Kairos-CIDA-7B ~85 tokens 2.5s ~2× faster than CoT

Kairos-CIDA generates a direct answer — not a thinking trace — and still beats the base model accuracy by a wide margin. Compared to CoT generation, it is approximately 2× faster and uses roughly 50% fewer tokens.

At production scale, this translates directly to infrastructure cost reduction.


🎯 Why This Matters

Against the backbone

The Qwen1.5-7B-Chat backbone with standard chain-of-thought scores 62.5% on GSM8K. Kairos-CIDA uses the same frozen backbone — not a single weight changed — and without generating any reasoning tokens, reaches 55.4% using our protocol, and competes on the same level without the CoT tax.

Against larger models

Kairos-CIDA at 7B + 160M trainable parameters outperforms models that are 10× larger on specific reasoning and code tasks. It does this without fine-tuning the backbone, without LoRA, and without any modification to the base model weights.

Against LoRA and full fine-tuning

LoRA modifies the backbone. Full fine-tuning replaces it. Kairos-CIDA leaves it completely untouched. That means:

  • The base model can be updated independently.
  • The adapter is portable across backbone versions.
  • There is no risk of degrading the backbone's general capabilities.

🔄 v1 → v2: The Alignment Tax Problem, Solved

Kairos-CIDA v1 was a math and logic specialist. Applying it to code tasks caused accuracy to drop — the reasoning mechanism was pulling the model away from code formatting. This is a known problem in adapter research: improving one skill degrades another.

v2 solved this completely.

Math accuracy increased from 42% to 55.4% while coding benchmarks went from near-zero to 100% syntax compliance. The two capabilities are now additive, not competitive.

The technical approach behind this is proprietary. The results are public.


🧠 What It Can Do

Kairos-CIDA is a single adapter that covers three domains without task-specific configuration:

  • Mathematics — Multi-step arithmetic, algebra, word problems, grade-school through competition-level reasoning.
  • Python Code — Function synthesis, algorithm implementation, debugging, competitive programming at introductory level.
  • Logic and Science — Multiple-choice scientific reasoning, deductive logic, causal inference.

Same model. Same weights. No prompt engineering required beyond specifying the domain.


🚀 Quickstart

pip install cida-plugin
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModelForCausalLM
from cida_plugin import CIDAPlugin, CIDAPluginConfig

DEVICE = "cuda"

# Frozen backbone — nothing here is ever modified
tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen1.5-7B-Chat", trust_remote_code=True, padding_side="left"
)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

llm = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen1.5-7B-Chat", trust_remote_code=True,
    torch_dtype=torch.float16, device_map="auto", output_hidden_states=True,
)
for p in llm.parameters():
    p.requires_grad = False
llm.eval()

# Load Kairos-CIDA adapter
cida_cfg = CIDAPluginConfig.from_pretrained("Kairatzh/Kairos-CIDA-7B")
cida     = CIDAPlugin(cida_cfg)

# Load state dict
sd = torch.hub.load_state_dict_from_url(
    "https://huggingface.co/Kairatzh/Kairos-7B-CIDA-v2/resolve/main/pytorch_model.bin", 
    map_location="cpu"
)
cida.load_state_dict(sd)
cida = cida.to(DEVICE).float().eval()

# See the full inference example in the repository notebooks

Full inference code, training notebooks, and the benchmark evaluation suite are available at github.com/Kairatzh/CIDA-plugin.


📂 Model Files

File Description
pytorch_model.bin CIDA adapter weights (159.7M parameters)
plan_projector.pt Plan projection layer
config.json Model configuration
kairos_config.json Training and architecture metadata

📜 Citation

@misc{kairos-cida-2025,
  title        = {Kairos-CIDA-7B: Latent Reasoning Adapter for Frozen LLMs},
  author       = {Kairatzh},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/Kairatzh/Kairos-7B-CIDA-v2}},
  note         = {Repository: https://github.com/Kairatzh/CIDA-plugin}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cida_plugin-1.1.1.tar.gz (88.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cida_plugin-1.1.1-py3-none-any.whl (90.7 kB view details)

Uploaded Python 3

File details

Details for the file cida_plugin-1.1.1.tar.gz.

File metadata

  • Download URL: cida_plugin-1.1.1.tar.gz
  • Upload date:
  • Size: 88.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for cida_plugin-1.1.1.tar.gz
Algorithm Hash digest
SHA256 a0639dfcd8f2e39f008a95d4a90f9887192e364e8fb5e338e6599c789e898be5
MD5 e754e9674fd99b308bc84e7da2ec3d79
BLAKE2b-256 141d028e879c0ea9254b9f987ec6deb14a7628a605d7db91e133162e23b614a7

See more details on using hashes here.

File details

Details for the file cida_plugin-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: cida_plugin-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 90.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for cida_plugin-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2a2359c945029584fc52bf6a6f955b848fae80502a323c0a8a7a655e1e7b4f4c
MD5 4af76408153f56df569244ecb49ad589
BLAKE2b-256 e428e058a7605e80b1f3de5e0ac1da77ad2affe4f62b6f1f7333fb22cf3143c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page