AI Research Operating System — skill testing, synthetic data, evaluation, multi-agent orchestration

These details have not been verified by PyPI

Project links

Project description

Cortex Research Suite

AI Research Operating System — 26 autonomous skills, self-evolving Skill Organism, browser-based experiment arena, and a Python evaluation framework. Covers research workflows, MLOps enforcement, security auditing, agent orchestration, intelligence analysis, and developer tooling. Works natively with Claude Code and integrates with LangChain, CrewAI, and OpenAI via MCP adapters.

Quickstart — 3 entry points

1. Browser (zero install)

Open dashboards/skill_arena_demo.html in any browser. Paste your Anthropic API key. Pick a skill. Click Run Experiment. Watch the organism evolve in real time.

2. Terminal

git clone https://github.com/TECHKNOWMAD-LABS/cortex-research-suite.git
cd cortex-research-suite
pip install -e ".[dev]"

# Run the test suite
pytest

# Generate evaluation datasets for all 26 skills
python datasets/generators/skill_dataset_generator.py --all-skills --n 50

# Run evaluation on any skill
python skills/skill-test-harness/scripts/eval_judge.py \
  --skill security-audit --dataset datasets/synthetic/security-audit/shard_000.json

# Run the multi-agent debate engine
python skills/agent-orchestrator/scripts/debate_engine.py \
  --topic "AI safety in healthcare" --rounds 3

# Start overnight evolution (runs in background)
python skill-organism/enterprise_runner.py --overnight --generations 10

3. Python API

from cortex.synthetic.reasoning_generator import ReasoningGenerator
from cortex.evaluation.judge import LLMJudge
from cortex.agents.orchestrator import AgentOrchestrator

# Generate evaluation prompts
gen = ReasoningGenerator(seed=42)
prompts = gen.generate(100)

# Run multi-agent research pipeline
orchestrator = AgentOrchestrator(provider)
result = orchestrator.run("Analyze the impact of transformer architectures on NLP")

# Evaluate output quality
judge = LLMJudge(provider)
score = judge.score(prompt="...", response=result.final_output)
print(f"Quality: {score.normalized:.0%}")  # e.g., Quality: 87%

Cortex in the intelligence ecosystem

Layer	Component	What it does	Integration
5. Simulation	MiroFish-inspired scenario simulator	Swarm-based what-if analysis with counterfactual injection	`skills/scenario-simulator/`
4. Intelligence	BettaFish-inspired analysis engine	Multi-source intelligence queries, forum analysis, multimodal	`skills/intelligence-query/`, `skills/forum-intelligence/`, `skills/multimodal-analyst/`
3. Data	MindSpider connector	Live topic feeds from social listening deployments	`skills/mindspider-connector/`
2. Evolution	Skill Organism + ARENA.md	Autonomous overnight evolution with fitness tracking	`skill-organism/`, `skills/*/ARENA.md`
1. Foundation	21 core skills + evaluation lab	Research, security, MLOps, orchestration, quality	`skills/`, `cortex/`

Skills (26)

Skill	Category	Description
`agent-orchestrator`	Agents	Multi-agent coordination with DAG task graphs
`agent-output-validator`	Validation	Automated validation of agent outputs against quality gates
`code-review-engine`	Engineering	Automated code review with security checks
`context-engineer`	Engineering	Context window optimization and prompt management
`de-slop`	Quality	AI-generated writing pattern detection and removal
`design-system-forge`	Design	Design system generation and component library scaffolding
`dev-lifecycle-engine`	DevOps	Development lifecycle management
`diff-generator`	Engineering	Structured diff generation for code and document changes
`forum-intelligence`	Intelligence	Forum thread analysis with coordination detection
`github-mcp`	Integration	GitHub API via Model Context Protocol
`intelligence-query`	Intelligence	Multi-source intelligence analysis engine
`meta-skill-evolver`	Meta	Evolutionary skill improvement and mutation engine
`mindspider-connector`	Data	Live social listening feed connector
`mlops-standards`	MLOps	ML operations best practices enforcement
`multimodal-analyst`	Intelligence	Cross-modal content analysis (text + image + video)
`persistent-memory`	Infrastructure	SQLite-backed memory with FTS5 search
`pre-package-pipeline`	Packaging	Skill validation and packaging pipeline
`prompt-architect`	Engineering	Prompt engineering and optimization
`repo-publisher`	DevOps	Pre-publish pipeline with security scanning
`research-workflow`	Research	Experiment design and methodology
`scenario-simulator`	Simulation	MiroFish-inspired swarm scenario simulation
`security-audit`	Security	Bandit + semgrep + secret scanning pipeline
`session-memory`	Infrastructure	Session-scoped memory persistence
`skill-test-harness`	Testing	Automated skill testing framework with LLM-as-Judge
`skill-validator`	Validation	Skill structure and manifest validation
`tdd-enforcer`	Testing	Test-driven development enforcement

See AGENTS.md for the full agent manifest with platform-specific integration guides.

Cortex Python Framework

Module	Purpose
`cortex.synthetic`	Synthetic data generation (reasoning, research, strategy, domain, adversarial)
`cortex.evaluation`	LLM-as-Judge scoring, benchmark suites, regression detection
`cortex.agents`	Multi-agent orchestrator, debate arena, DAG task graphs
`cortex.models`	Model provider abstraction (Anthropic SDK + CLI fallback)
`cortex.telemetry`	Structured logging, SQLite metrics collector
`cortex.config`	YAML + env var configuration with thread-safe singleton
`cortex.utils`	Atomic I/O, input sanitization, prompt injection detection
`cortex.experiments`	Experiment tracking with comparison and best-run queries

Skill Organism

The skill-organism/ directory contains the evolution engine. Skills are automatically tested and scored. Underperformers get modified via mutation, top performers get replicated via crossbreeding, and the system recovers from population loss by restoring previously successful versions.

Key features:

ARENA.md per skill — the "program.md" from Karpathy's autoresearch pattern
ArenaConfig parser with trilogy integration fields
EvalBudget context manager for time-bounded evaluation
Git-per-experiment branching (branch, mutate, evaluate, merge or discard)
Overnight runner with asyncio.Semaphore(4) parallel generations
Crash-safe JSONL evolution log for dashboard consumption

See OVERNIGHT_USAGE.md for autonomous evolution setup.

Cross-Platform Support

Platform	Adapter Type	Status
Claude Code	Native Skills	Primary
MCP (Model Context Protocol)	FastMCP Servers	Generated
LangChain	Tool Classes	Generated
CrewAI	Tool Wrappers	Generated
OpenAI GPT Actions	Action Schemas	Generated
VS Code / Copilot / Cursor / Windsurf / JetBrains	MCP via Extension	Compatible

Project Structure

cortex-research-suite/
├── cortex/                    # Python framework (pip install -e .)
├── skills/                    # 26 autonomous skills (SKILL.md + ARENA.md + scripts/)
├── skill-organism/            # Skill evolution engine
├── knowledge/                 # Knowledge store (FTS5 + GraphRAG)
├── experiments/               # Experiment tracker (SQLite)
├── datasets/                  # Synthetic datasets + MindSpider feed
├── benchmarks/                # Baselines for all skills
├── dashboards/                # Browser dashboards (evolution, benchmark, arena)
├── cross-platform/            # Generated adapters (MCP, LangChain, CrewAI, OpenAI)
├── packages/                  # Standalone packages (de-slop-cli)
├── docs/                      # Documentation site (GitHub Pages)
├── scripts/                   # CLI entry points and utilities
├── tests/                     # Test suite
└── .github/workflows/         # CI/CD (lint, test, security, eval, release)

Security

All code passes automated security scanning on every push:

Bandit Python SAST with zero HIGH/MEDIUM findings
CodeQL semantic code analysis
Secret scanning with push protection enabled
Dependabot automated dependency updates
Prompt injection detection (7 compiled regex patterns)
Path traversal protection across all I/O operations
Browser arena: CSP, sessionStorage key isolation, rate limiting, input sanitization

Report vulnerabilities to admin@techknowmad.ai. See SECURITY.md.

Legal

Cortex Research Suite is MIT licensed. Trilogy integration skills are inspired by the architectural patterns of MindSpider, BettaFish, and MiroFish. No code has been copied. See LEGAL_NOTES.md.

Contributing

See CONTRIBUTING.md for the full guide including how to add new skills. All PRs require:

Passing CI checks (bandit, lint, tests)
One approving review
No leaked secrets or credentials

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cortex_research_suite-1.1.0.tar.gz (2.1 MB view details)

Uploaded Mar 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cortex_research_suite-1.1.0-py3-none-any.whl (2.5 MB view details)

Uploaded Mar 16, 2026 Python 3

File details

Details for the file cortex_research_suite-1.1.0.tar.gz.

File metadata

Download URL: cortex_research_suite-1.1.0.tar.gz
Upload date: Mar 16, 2026
Size: 2.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for cortex_research_suite-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`24b8318a8ac2dd017f96f57f8227950517cef87800469283d872a5bb5cc66429`
MD5	`3dfaf9e478420a6f51a3148f28b15d12`
BLAKE2b-256	`814717c66a7ad48fcb1f8220a82a48ecaadf5b4bb0dfcfd0cce4eb180ea8f2e4`

See more details on using hashes here.

File details

Details for the file cortex_research_suite-1.1.0-py3-none-any.whl.

File metadata

Download URL: cortex_research_suite-1.1.0-py3-none-any.whl
Upload date: Mar 16, 2026
Size: 2.5 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for cortex_research_suite-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c209abbd76e26602b98f6355c2b4cdcbbc34da916b130254b3e22c060b7d81c4`
MD5	`37bb113ab1583e85881789e21d711c84`
BLAKE2b-256	`9279e3bfbbdba82c9bfbba5d96b49279b58df680570399db6b20f45ad96aa0d8`

See more details on using hashes here.

cortex-research-suite 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Cortex Research Suite

Quickstart — 3 entry points

1. Browser (zero install)

2. Terminal

3. Python API

Cortex in the intelligence ecosystem

Skills (26)

Cortex Python Framework

Skill Organism

Cross-Platform Support

Project Structure

Security

Legal

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes