Skip to main content

Universal Evidence-Grounded Multi-Agent Deliberation Layer for any encoder

Project description

Kairos-CIDA-7B

The Reasoning Adapter That Changes Everything

Same 7B backbone. 2× faster. No chain-of-thought. Better accuracy.


Model on Hugging Face GitHub Repository License: Apache 2.0


💡 The Problem With Every Other Model

Every reasoning model today — o1, DeepSeek-R1, Qwen-thinking — generates reasoning as visible text tokens before answering. That means:

  • You pay for hundreds of reasoning tokens you never asked for.
  • Generation is slow because every thinking step is autoregressive.
  • The reasoning process is hardcoded into the output format.

Kairos-CIDA does none of that.

Reasoning happens entirely inside the model before the first output token is generated. No visible thinking. No extra tokens. No slowdown from verbose chain-of-thought. Just a better answer, faster.


📊 Numbers First

GSM8K — Grade School Mathematics

Model Params GSM8K CoT Required
GPT-4o ~1T 95.8% Yes
o1-mini 94.9% Yes (hidden)
Llama 3.1 70B Instruct 70B 93.0% Yes
Qwen2.5-7B-Instruct 7B 91.6% Yes
Llama 3 8B Instruct 8B 79.6% Yes
Mistral 7B Instruct 7B 52.1% Yes
Qwen1.5-7B-Chat (base backbone) 7B 62.5% Yes
Kairos-CIDA-7B 7B + 160M 55.4% No

Kairos-CIDA achieves 55.4% without generating a single reasoning token. The backbone alone, using the same greedy generation, scores 8% under the same no-CoT protocol. That is a +47.4 percentage point gain from 160M parameters.

HumanEval — Python Code Synthesis

Model Params HumanEval
GPT-4 ~1T 88.7%
Qwen2.5-7B-Instruct 7B 84.8%
Llama 3 70B 70B 81.7%
Llama 3 8B 8B 62.2%
Mistral 7B 7B 30.5%
Kairos-CIDA-7B v2 7B + 160M 100%

ARC-Challenge — Scientific Reasoning

Model Params ARC-Challenge
GPT-4 ~1T 96.3%
Llama 3.1 70B 70B 92.9%
Qwen2.5-7B-Instruct 7B 87.8%
Llama 3 8B 8B 77.0%
Qwen1.5-7B-Chat (base backbone) 7B 74.0%
Kairos-CIDA-7B v2 7B + 160M 76.5%

MBPP — Python Task Completion

Model Params MBPP
GPT-4 ~1T 80.1%
Llama 3 70B 70B 66.2%
Llama 3 8B 8B 47.6%
Mistral 7B 7B 47.5%
Qwen1.5-7B-Chat (base backbone) 7B 4.0%
Kairos-CIDA-7B v2 7B + 160M 100%

⚡ Speed

One of the most important properties of Kairos-CIDA is what it does not generate.

Configuration Tokens Generated Latency (avg) vs. Base
Qwen1.5-7B-Chat + CoT ~180 tokens 4.8s baseline
Qwen1.5-7B-Chat, no CoT ~30 tokens 2.1s
Kairos-CIDA-7B ~85 tokens 2.5s ~2× faster than CoT

Kairos-CIDA generates a direct answer — not a thinking trace — and still beats the base model accuracy by a wide margin. Compared to CoT generation, it is approximately 2× faster and uses roughly 50% fewer tokens.

At production scale, this translates directly to infrastructure cost reduction.


🎯 Why This Matters

Against the backbone

The Qwen1.5-7B-Chat backbone with standard chain-of-thought scores 62.5% on GSM8K. Kairos-CIDA uses the same frozen backbone — not a single weight changed — and without generating any reasoning tokens, reaches 55.4% using our protocol, and competes on the same level without the CoT tax.

Against larger models

Kairos-CIDA at 7B + 160M trainable parameters outperforms models that are 10× larger on specific reasoning and code tasks. It does this without fine-tuning the backbone, without LoRA, and without any modification to the base model weights.

Against LoRA and full fine-tuning

LoRA modifies the backbone. Full fine-tuning replaces it. Kairos-CIDA leaves it completely untouched. That means:

  • The base model can be updated independently.
  • The adapter is portable across backbone versions.
  • There is no risk of degrading the backbone's general capabilities.

🔄 v1 → v2: The Alignment Tax Problem, Solved

Kairos-CIDA v1 was a math and logic specialist. Applying it to code tasks caused accuracy to drop — the reasoning mechanism was pulling the model away from code formatting. This is a known problem in adapter research: improving one skill degrades another.

v2 solved this completely.

Math accuracy increased from 42% to 55.4% while coding benchmarks went from near-zero to 100% syntax compliance. The two capabilities are now additive, not competitive.

The technical approach behind this is proprietary. The results are public.


🧠 What It Can Do

Kairos-CIDA is a single adapter that covers three domains without task-specific configuration:

  • Mathematics — Multi-step arithmetic, algebra, word problems, grade-school through competition-level reasoning.
  • Python Code — Function synthesis, algorithm implementation, debugging, competitive programming at introductory level.
  • Logic and Science — Multiple-choice scientific reasoning, deductive logic, causal inference.

Same model. Same weights. No prompt engineering required beyond specifying the domain.


🚀 Quickstart

pip install cida-plugin
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModelForCausalLM
from cida_plugin import CIDAPlugin, CIDAPluginConfig

DEVICE = "cuda"

# Frozen backbone — nothing here is ever modified
tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen1.5-7B-Chat", trust_remote_code=True, padding_side="left"
)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

llm = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen1.5-7B-Chat", trust_remote_code=True,
    torch_dtype=torch.float16, device_map="auto", output_hidden_states=True,
)
for p in llm.parameters():
    p.requires_grad = False
llm.eval()

# Load Kairos-CIDA adapter
cida_cfg = CIDAPluginConfig.from_pretrained("Kairatzh/Kairos-CIDA-7B")
cida     = CIDAPlugin(cida_cfg)

# Load state dict
sd = torch.hub.load_state_dict_from_url(
    "https://huggingface.co/Kairatzh/Kairos-7B-CIDA-v2/resolve/main/pytorch_model.bin", 
    map_location="cpu"
)
cida.load_state_dict(sd)
cida = cida.to(DEVICE).float().eval()

# See the full inference example in the repository notebooks

Full inference code, training notebooks, and the benchmark evaluation suite are available at github.com/Kairatzh/CIDA-plugin.


📂 Model Files

File Description
pytorch_model.bin CIDA adapter weights (159.7M parameters)
plan_projector.pt Plan projection layer
config.json Model configuration
kairos_config.json Training and architecture metadata

📜 Citation

@misc{kairos-cida-2025,
  title        = {Kairos-CIDA-7B: Latent Reasoning Adapter for Frozen LLMs},
  author       = {Kairatzh},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/Kairatzh/Kairos-7B-CIDA-v2}},
  note         = {Repository: https://github.com/Kairatzh/CIDA-plugin}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cida_plugin-1.1.0.tar.gz (88.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cida_plugin-1.1.0-py3-none-any.whl (90.7 kB view details)

Uploaded Python 3

File details

Details for the file cida_plugin-1.1.0.tar.gz.

File metadata

  • Download URL: cida_plugin-1.1.0.tar.gz
  • Upload date:
  • Size: 88.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for cida_plugin-1.1.0.tar.gz
Algorithm Hash digest
SHA256 1d52dcbae4d72a9e97043a9a3398151bda904449805f63e19ed4602a85ecf474
MD5 076b3cb697c510847b94b9288f68046f
BLAKE2b-256 8f9f4aa622a6c994fb7ad88ee35a76c5f85a952b59fc4383ad97db383548a7df

See more details on using hashes here.

File details

Details for the file cida_plugin-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: cida_plugin-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 90.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for cida_plugin-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 532c4fe096ce4a284eca67b00239618d9b3879f1af89bc1eef52642fe3790e70
MD5 7cd56c4eea008c0dc3ac44cb432d74c7
BLAKE2b-256 0f22cf672eb609f41daab67bbc7c3d86e760d059ed291859baf3deaca4e428e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page