Skip to main content

A self-improving code reasoning engine with persistent semantic memory

Project description

Neo Banner: Imagery from The Matrix film series


Neo

A self-improving code reasoning engine that learns from experience using persistent semantic memory. Neo uses multi-agent reasoning to analyze code, generate solutions, and continuously improve through feedback loops.

  • Fact-Based Memory: Learns from every solution attempt using a scoped, supersession-based fact store
  • Semantic Retrieval: Vector search finds relevant facts via Jina Code embeddings
  • Code-First Generation: No diff parsing failures
  • Local Storage: Privacy-first JSON storage in ~/.neo/facts/ directory
  • Model-Agnostic: Works with any LM provider
  • Three integration surfaces — each on equal footing:
    • Run as an Agent (CAR / A2A) — host Neo as an Agent2Agent v1.0 endpoint other agents (or orchestrators) can call directly. Real inference path, not a CLI wrapper.
    • Claude Code Plugin — six slash commands + a specialized agent inside Anthropic's Claude Code CLI.
    • Codex Plugin — same six skills, packaged for OpenAI Codex CLI.

PyPI version Python Versions License: Apache 2.0

Why Neo? Why Care?

If you've been Vibe Coding, then Vibe Planning, then Context Engineering, and on and on, you have likely hit walls where the models are both powerful and limited, brilliant and incompetent, wise and ignorant, humble yet overconfident.

Worse, your speedy AI Code Assistant sometimes goes rogue and overwrites key code in a project, or writes redundant code even after just reading documentation and the source code, or violates your project's patterns and design philosophy.... It can be infuriating. Why doesn't the model remember? Why doesn't it learn? Why can't it keep the context of the code patterns and tech stack? ... -> This is what Neo is designed to solve.

Neo is the missing context layer for AI Code Assistants. It learns from every solution attempt, using vector embeddings to retrieve relevant patterns for new problems. It then applies the learned patterns to generate solutions, and continuously improves through feedback loops.

Table of Contents

Design Philosophy

Fact-Based Learning: Neo builds a semantic memory of facts — constraints, architectural decisions, patterns, review learnings, decisions, known unknowns, and failures — using vector embeddings for retrieval.

Code-First Output: Instead of generating diffs that need parsing, Neo outputs executable code blocks directly, eliminating extraction failures.

Scoped Storage: Facts are scoped to global, organization, or project level, stored locally in ~/.neo/facts/ for privacy and offline access.

Model-Agnostic: Works with OpenAI, Anthropic, Google, local models, or Ollama via a simple adapter interface.

How It Works

User Problem → Neo CLI → Semantic Retrieval → Reasoning → Code Generation
                           ↓
                    [Vector Search]
                    [Pattern Matching]
                    [Confidence Scoring]
                           ↓
                    Executable Code + Memory Update

Neo retrieves relevant facts using Jina Code embeddings (768-dimensional vectors), applies learned patterns, generates solutions, and stores new facts for continuous improvement.

  1. Jina's embeddings model (open source) is downloaded automatically when you first run Neo. This model runs locally on your machine to generate vector embeddings.

The Construct

Neo includes The Construct - a curated library of architecture and design patterns with semantic search capabilities. Think of it as your personal reference library for common engineering patterns, indexed and searchable using the same embedding technology that powers Neo's reasoning memory.

What is The Construct?

The Construct is a collection of vendor-agnostic design patterns covering:

  • Rate Limiting: Token bucket, sliding window, distributed rate limiting
  • Caching: Cache-aside, write-through, invalidation strategies
  • More domains: Additional patterns contributed by the community

Each pattern follows a structured format inspired by the Gang of Four:

  • Intent: What problem does this solve?
  • Forces: Key constraints and tradeoffs
  • Solution: Conceptual structure (no framework-specific code)
  • Consequences: Benefits, risks, and observability signals
  • References: Links to real-world implementations

Using The Construct

# List all patterns
neo construct list

# Filter by domain
neo construct list --domain rate-limiting

# Show a specific pattern
neo construct show rate-limiting/token-bucket

# Semantic search across patterns
neo construct search "how to prevent api abuse"

# Build the search index
neo construct index

Pattern Quality Standards

All patterns must:

  • Include author attribution
  • Be under 300 lines
  • Remain vendor-agnostic (no AWS/GCP/Azure-specific solutions)
  • Include concrete consequences and observability guidance

See /construct/README.md for contribution guidelines.

  1. When you ask Neo for help:
    • Your query is embedded locally using the Jina model
    • Neo searches the fact store for relevant knowledge (using cosine similarity)
    • Retrieved facts are organized into layers: constraints, relevant knowledge, recent changes, known unknowns
    • This combined context is sent to your chosen LLM API (OpenAI/Anthropic/Google)
    • The LLM generates a solution informed by both your query and past facts
    • The result is stored back as a new fact in local memory for future use

Local storage: ~/.neo/facts/facts_global.json ← Global-scoped facts ~/.neo/facts/facts_org_{id}.json ← Organization-scoped facts ~/.neo/facts/facts_project_{id}.json ← Project-scoped facts

Privacy:

  • Your code never leaves your machine during embedding/search
  • Only your prompt + retrieved facts are sent to the LLM API
  • This is the same as using the LLM directly, but with added context from something akin to memory.
  Your Prompt
     ↓
 Local Jina Embedding (768-dim vector)
     ↓
 Cosine Similarity Search (finds relevant facts)
     ↓
 Retrieve Facts from ~/.neo/facts/
     ↓
 Assemble Context: Constraints → Knowledge → Recent Changes → Known Unknowns
     ↓
 →→→ NETWORK CALL →→→ LLM API (OpenAI/Anthropic/etc.)
     ↓
 Solution Generated
     ↓
 Store as New Fact in Local Memory

Quick Start

# Install from PyPI (recommended)
pip install neo-reasoner

# Or install with specific LM provider
pip install neo-reasoner[openai]     # For GPT (same provider as the default)
pip install neo-reasoner[anthropic]  # For Claude
pip install neo-reasoner[google]     # For Gemini
pip install neo-reasoner[all]        # All providers

# Set API key
export OPENAI_API_KEY=sk-...

# Test Neo
neo --version

See QUICKSTART.md for 5-minute setup guide

Run as an Agent (CAR / A2A)

Neo integrates with Parslee's Common Agent Runtime (CAR) as a first-class peer of the CLI and the plugins. The integration runs both directions:

  • Inbound (host)neo serve exposes Neo as an Agent2Agent v1.0 endpoint. Other agents and orchestrators call Neo's neo.process tool over A2A directly. No CLI shell-out, no subprocess parsing.
  • Outbound (inference) — set provider="car" to route Neo's own LLM calls through CAR's unified inference layer. CAR's adaptive router picks local backends (Candle + MLX for Qwen3, Gemma 4) or remote providers (OpenAI, Anthropic, Google) per call based on task complexity, context-window headroom, and per-model latency/cost. Rust-enforced policies, deterministic eventlog/replay, and semantic conversation compaction all come for free.

A single CarRuntime is shared per process — if neo serve is running and the same process makes outbound calls, both surfaces see the same state, policies, tool registry, and eventlog.

Install the CAR extras

# CAR-backed serving and inference both require the car-runtime Python bindings
pip install "neo-reasoner[car]"

car-runtime ships as a sealed binary under a separate license (the rest of Neo stays Apache-2.0). Skip this extra if you only need the plugins and the CLI with direct provider SDKs.

Inbound: host Neo as an A2A endpoint

# 1. Start the CAR daemon (default ws://127.0.0.1:9100)
python -m car_runtime.server
# or, if installed standalone:
car-server

# 2. In another terminal, host Neo as an A2A endpoint
neo serve

neo serve boots a CarRuntime, registers Neo as the neo.process tool with its full schema (src/neo/car_tool_schema.py), installs the Python tools.execute handler, and binds the A2A HTTP listener. It blocks until SIGINT/SIGTERM.

Outbound: use CAR as Neo's inference layer

# Switch Neo's default provider
neo --config set --config-key provider --config-value car

# Let CAR's adaptive router pick the backend per call (recommended)
neo --config set --config-key model --config-value ""

# Or pin a specific model — local or remote
neo --config set --config-key model --config-value qwen3-32b
neo --config set --config-key model --config-value gpt-5

The CAR daemon must be running (car-server / python -m car_runtime.server). From Python:

from neo.adapters import create_adapter
adapter = create_adapter("car")                                    # router picks a code-capable model
adapter = create_adapter("car", model="Qwen3-4B")                  # pin a specific backend
adapter = create_adapter(
    "car",
    intent_hint={"task": "reasoning", "prefer_local": True},       # override the default
)

Default intent: CarAdapter sends intent_json={"task": "code"} on every call unless you supply your own intent_hint. Neo's workload is overwhelmingly code reasoning (review, optimization, debugging, generation), so the router gets to pick a code-capable model rather than the chat default. CAR's task enum is chat | classify | reasoning | code. The rest of IntentHint (prefer_local, prefer_fast, require: ModelCapability[]) is how you express what else you need without pinning a model ID.

Known limitation upstream: CAR's route_model currently scores prompts as "simple" by heuristic length and picks the cheap chat-tier model (e.g. gpt-4.1-mini) even when models like gpt-5.3-codex and o3 are registered and ranked as fallbacks. Tracked at Parslee-ai/car-releases#52. The task=code default is Neo's local workaround — substantive prompts do escalate; trivial ones don't.

Discover what's installed

# Detects native CLI, car-server, Python bindings, and the default daemon port
neo car status

# Also surfaced in --version output
neo --version

If the CLI/daemon are present but the Python bindings aren't, Neo reports that state cleanly. CAR install options live at Parslee-ai/car-releases.

Why use the CAR surfaces

  • Real inference path both ways — inbound, callers see Neo as a typed A2A tool; outbound, Neo gets local-first inference with automatic remote fallback through one provider-agnostic protocol
  • One runtime per host — session state, tool registry, policies, and the eventlog stay consistent across A2A inbound and inference outbound in the same process
  • Local-first inference, free fallback — Qwen3 / Gemma 4 on-device via Candle + MLX; remote OpenAI / Anthropic / Google when the router decides the task needs it
  • Policies enforced in Rust — deny rules and capability requirements run before any side-effecting call
  • Memory is shared across all surfaces~/.neo/facts/ and per-project indexes are the same whether you invoke via CLI, plugin, neo serve, or CAR inference

Claude Code Plugin

Neo ships as a Claude Code plugin with a specialized agent and six slash commands. Anthropic's Claude Code CLI installs it from Parslee's plugin marketplace:

# Add the marketplace
/plugin marketplace add Parslee-ai/claude-code-plugins

# Install Neo
/plugin install neo

Once installed:

  • Slash commands: /neo, /neo-review, /neo-optimize, /neo-architect, /neo-debug, /neo-pattern
  • Specialized agent: invoke with Use the Neo agent to ... for delegated semantic reasoning
  • Shared memory: same ~/.neo/facts/ store used by the CLI and the Codex plugin

Examples:

/neo-review src/api/handlers.py
/neo-optimize process_large_dataset function
/neo-architect Should I use microservices or monolith?
/neo-debug Race condition in task processor

The plugin wraps the local neo CLI, so the binary must be installed first (pip install neo-reasoner[openai] and OPENAI_API_KEY set, or your provider of choice).

Plugin sources live under .claude-plugin/plugin.json is the manifest, agents/neo.md defines the agent, and commands/*.md defines each slash command.

Codex Plugin

Neo ships as a Codex plugin with the same six skills, packaged for OpenAI Codex CLI. Add the marketplace and install Neo from Codex's plugin directory:

# Add Parslee's hosted marketplace
codex plugin marketplace add Parslee-ai/neo

# Or, from a local checkout, point Codex at the in-tree marketplace
codex plugin marketplace add ./

Once installed:

  • Skills: $neo, $neo-review, $neo-optimize, $neo-architect, $neo-debug, $neo-pattern
  • Shared memory: same ~/.neo/facts/ store used by the CLI and the Claude Code plugin

Examples:

$neo-review src/api/handlers.py
$neo-optimize process_large_dataset function
$neo-architect Should I use microservices or monolith?
$neo-debug Race condition in task processor

The plugin wraps the local neo CLI, so the binary must be installed first (pip install neo-reasoner[openai] and OPENAI_API_KEY set, or your provider of choice). Anything you teach Neo from Codex is immediately available in the Claude Code plugin and the CAR endpoint, and vice versa — there is one fact store per host.

Plugin sources live under plugins/neo/ — see the manifest and skill definitions.

Works Alongside Your AI Tools

Neo automatically reads project-local agent instruction docs from a wide range of ecosystems and folds them into its reasoning context — no configuration needed. If you've already invested in writing a CLAUDE.md, an AGENTS.md, .cursor/rules/, .github/copilot-instructions.md, or a Spec Kit project, neo respects that work.

Tool Files / dirs neo discovers
Claude / Claude Code CLAUDE.md, .claude/CLAUDE.md, .claude/agents/*.md, .claude/commands/*.md
Codex / AGENTS.md spec AGENTS.md, .github/AGENTS.md, .codex/**/*.md
Cursor .cursorrules, .cursor/rules/**/*.md, .cursor/rules/**/*.mdc
GitHub Copilot .github/copilot-instructions.md
Windsurf .windsurfrules
Continue .continue/**/*.md
Augment .augment/**/*.md
Spec Kit .specify/**/*.md
Aider .aider/*.md
Codeium .codeium/*.md

Discovered docs surface in neo's prompt under PROJECT-LOCAL AGENT CONTEXT, included unconditionally — independent of relevance ranking — because their value is global to the project. Per-file cap of 6KB and total cap of 32KB keep prompt growth bounded.

This means neo composes well with whichever AI coding workflow you already use:

  • Claude Code users get the deepest integration via the Claude Code Plugin, but neo runs standalone too.
  • Codex CLI users get parity via the Codex Plugin — same six skills, packaged for Codex. Neo also automatically picks up AGENTS.md (the cross-tool standard Codex co-led) plus anything under .codex/.
  • Cursor / Windsurf / Aider / Continue / Augment users — the rules dirs you've curated land in every neo session's context.
  • GitHub Copilot users — .github/copilot-instructions.md is read on every invocation.
  • Spec Kit projects — your specs are folded into neo's reasoning context, no manual paste.

Adding a new tool is a one-liner: extend the discovery rules in src/neo/agent_context.py. The list is the load-bearing surface for keeping this current as new agent ecosystems emerge.

Installation

From PyPI (Recommended)

# Install Neo
pip install neo-reasoner

# With specific LM provider
pip install neo-reasoner[openai]     # GPT (recommended)
pip install neo-reasoner[anthropic]  # Claude
pip install neo-reasoner[google]     # Gemini
pip install neo-reasoner[all]        # All providers

# Verify installation
neo --version

Updating Neo

Neo supports both manual and fully automatic updates:

Manual Updates

# Option 1: Use neo's built-in update command (simplest)
neo update

# Option 2: Update with pip
pip install --upgrade neo-reasoner

# Option 3: Use pipx for isolated installation (recommended for end users)
pipx install neo-reasoner          # First-time install
pipx upgrade neo-reasoner           # Update to latest version
pipx upgrade-all                    # Update all pipx packages

Fully Automatic Updates

Automatic update installation is enabled by default for pipx and virtualenv installs. You can set it explicitly with:

# Enable auto-install (persisted in ~/.neo/config.json)
neo --config set --config-key auto_install_updates --config-value true

# Or use environment variable
export NEO_AUTO_INSTALL_UPDATES=1

When enabled, Neo will:

  • Check for updates once every hour using a stale-while-revalidate cache
  • Automatically download and install new versions in the background
  • Notify you when updates complete
  • Log all auto-update activity to ~/.neo/auto_update.log

Example output when auto-install is enabled:

$ neo "your query" Auto-installing neo update: 0.18.0  0.18.1
   This happens in the background. Please wait...

✓ Auto-update completed: 0.18.1
   Restart neo to use the new version.

[Neo] Processing your query...

Update Notifications (Default)

By default, Neo checks for updates once every hour and displays a notification when a new version is available. This check happens in the background and will not interrupt your workflow.

To disable update checks entirely:

export NEO_SKIP_UPDATE_CHECK=1

From Source (Development)

# Clone repository
git clone https://github.com/Parslee-ai/neo.git
cd neo

# Install in development mode with all dependencies
pip install -e ".[dev,all]"

# Verify installation
neo --version

Dependencies

Core dependencies are automatically installed via pyproject.toml:

  • numpy >= 1.24.0
  • scikit-learn >= 1.3.0
  • datasketch >= 1.6.0
  • fastembed >= 0.3.0
  • faiss-cpu >= 1.7.0
  • jsonschema >= 4.0.0
  • pyyaml >= 6.0
  • openai >= 1.0.0 (default provider; base install is runnable with just OPENAI_API_KEY)
  • tree-sitter >= 0.23, < 0.26
  • tree-sitter-language-pack >= 0.13.0, < 1.0

Optional: Additional LM Providers

OpenAI is bundled in the base install. Add others as needed:

pip install anthropic               # Claude
pip install google-genai>=0.2.0     # Gemini (requires Python 3.10+)
pip install requests                # Ollama

See INSTALL.md for detailed installation instructions

Usage

CLI Interface

# Ask Neo a question
neo "how do I fix the authentication bug?"

# With working directory context
neo --cwd /path/to/project "optimize this function"

# Build the per-project semantic index (powers smart file selection)
neo --index

# Incrementally refresh the index after meaningful changes (re-embeds only changed files)
neo --update

# Preview the assembled context without making an LLM call
neo --dry-run "your query"

# Check version and memory stats
neo --version

# Inspect detected local CAR runtime surfaces
neo car status

Memory Maintenance

# Compact fact files by dropping old invalid tombstones (default: > 30 days since last access)
neo memory prune

# Across every local project Neo has touched
neo memory prune --all

# Preview without writing
neo memory prune --dry-run --max-invalid-age-days 14

Use prune when a ~/.neo/facts/facts_project_*.json file grows much larger than its 500-valid-fact cap — that gap is tombstone bloat from supersession. Defaults are conservative; raising --max-invalid-age-days is safe, lowering it past ~7 may evict tombstones still referenced by recent supersession chains.

Timeout Requirements

Neo makes blocking LLM API calls that typically take 30-120 seconds. When calling Neo from scripts or automation, use appropriate timeouts:

# From shell (10 minute timeout)
timeout 600 neo "your query"

# From Python subprocess
subprocess.run(["neo", query], timeout=600)

Insufficient timeouts will cause failures during LLM inference, not context gathering.

Output Format

Neo outputs executable code blocks with confidence scores:

def solution():
    # Neo's generated code
    pass

Personality System

Neo responds with personality (Matrix-inspired quotes) when displaying version info:

$ neo --version
"What is real? How do you define 'real'?"

neo 0.18.1
Provider: openai | Model: gpt-5.5
Stage: Sleeper | Memory: 0.0%
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
0 facts | 0.00 avg confidence

Load Program - Training Neo's Memory

"The Operator uploads a program into Neo's head."

Neo can bootstrap its memory by importing facts from HuggingFace datasets. This is NOT model fine-tuning - it's retrieval learning that expands local semantic memory with reusable code knowledge.

# Install datasets library
pip install datasets

# Load patterns from MBPP (recommended starter - 1000 Python problems)
neo --load-program mbpp --split train --limit 1000

# Load from OpenAI HumanEval (164 hand-written coding problems)
neo --load-program openai_humaneval --split test

# Load from BigCode HumanEvalPack (multi-language variants)
neo --load-program bigcode/humanevalpack --split test --limit 500

# Dry run to preview
neo --load-program mbpp --dry-run

# Custom column mapping
neo --load-program my_dataset \
    --columns '{"text":"pattern","code":"solution"}'

Output (Matrix-style):

"I know kung fu."

Loaded: 847 facts
Deduped: 153 duplicates
Index rebuilt: 1.2s
Memory: 1247 total facts

How it works:

  1. Acquire: Pull dataset from HuggingFace
  2. Normalize: Map rows to fact schema
  3. Dedupe: Hash-based deduplication against existing memory
  4. Embed: Generate local embeddings (Jina Code v2)
  5. Store: Add as facts to the fact store
  6. Report: Matrix quote + counts

Key points:

  • NOT fine-tuning - just expanding retrieval memory
  • Facts start at 0.3 confidence (trainable via real-world usage)
  • Automatic deduplication prevents memory bloat
  • Uses local embeddings (no data leaves your machine)
  • Stored in ~/.neo/facts/ alongside learned facts

See docs/LOAD_PROGRAM.md for detailed documentation

Architecture

Fact-Based Memory

Neo uses a scoped, supersession-based fact store with Jina Code v2 embeddings (768 dimensions) for semantic retrieval:

  1. Typed Facts: Eight kinds — CONSTRAINT, ARCHITECTURE, DECISION, PATTERN, REVIEW, FAILURE, KNOWN_UNKNOWN, and EPISODE (instance-specific events with {when, where, why, with_whom} context).
  2. Scoped Organization: Facts are scoped to global, organization, or project level, with per-scope valid-fact caps (200 / 100 / 500 / 50). Org and project are auto-detected from git remotes.
  3. Supersession & Pre-Write Dedup: New facts with cosine similarity ≥ 0.85 to an existing fact short-circuit (bump the existing fact's access count) or supersede it. The pre-write canonical-signature check uses entity abstraction + verb-synonym folding to catch near-duplicates before they hit the store.
  4. Confidence + Effectiveness Ranking: rank_score = recall_decay(sim)·confidence + success_bonus·effectiveness_f + provenance_bonus. The Ebbinghaus recall-probability transform gives frequently-recalled facts slower decay; LessonL-style effectiveness (c/n over reuse outcomes) multiplies the success bonus. Curated facts (CONSTRAINT/ARCHITECTURE/DECISION and seed/community/synthesized-tagged facts) bypass decay.
  5. Hybrid Retrieval: 0.7·dense (Jina) + 0.3·BM25. Half the result slots ranked by full rank_score, half by raw cosine — novel-but-relevant facts aren't crowded out by validated winners.
  6. Triple-Trigger Consolidation: REVIEW facts cluster into PATTERN / FAILURE archetypes when ANY of count-delta ≥10, elapsed ≥1h, or confidence-decile entropy >0.9 fires. Clusters of ≥3 get an NREM-style Hebbian confidence bump; non-curated facts decay 3% globally after each pass.
  7. Dual-Buffer Probation: New non-curated facts enter with a probation tag and a 3-day stale window (vs 7/14 normal); promoted automatically on access_count ≥ 2 or success_count > 0 — quietly evicts noise while keeping real signal.
  8. Four-Layer Context: Retrieved facts are organized into constraints, relevant knowledge, recent changes, and known unknowns. The four-layer state model is from Beyond Conversation: A State-Based Context Architecture for Enterprise AI Agents (Liotta, 2025) — see papers/state-based-context-architecture.pdf. The token-budget enforcement in memory/context.py is ported from the engine described in Memgine: A Deterministic Memory Engine for Stateful AI Agents (Liotta, 2026) — see papers/memgine-deterministic-memory-engine.pdf. Both are evaluated by StateBench; the 95.8% decision-accuracy result on the v1.0 development split is what drove Neo's move from a separate "Recently Changed" section to inline (changed from: X) annotations.

Output Schemas

Neo generates structured outputs with executable code and planning artifacts:

CodeSuggestion - Executable code with actionable metadata:

@dataclass
class CodeSuggestion:
    # Core fields
    file_path: str
    unified_diff: str           # Legacy: backward compatibility
    code_block: str = ""        # Primary: executable Python code
    description: str
    confidence: float
    tradeoffs: list[str]

    # Executable artifacts (v0.8.0+)
    patch_content: str = ""            # Full unified diff content
    apply_command: str = ""            # Shell command to apply (advisory)
    rollback_command: str = ""         # Shell command to undo (advisory)
    test_command: str = ""             # Shell command to verify (advisory)
    dependencies: list[str] = []       # Other suggestion IDs this depends on
    estimated_risk: str = ""           # "low", "medium", or "high"
    blast_radius: float = 0.0          # 0.0-100.0 percentage of codebase affected

PlanStep - Incremental planning with step-level metadata:

@dataclass
class PlanStep:
    # Core fields
    description: str
    rationale: str
    dependencies: list[int] = []

    # Incremental planning (v0.8.0+)
    preconditions: list[str] = []      # Conditions before execution
    actions: list[str] = []            # Concrete actions to perform
    exit_criteria: list[str] = []      # Success verification criteria
    risk: str = "low"                  # "low", "medium", "high"
    retrieval_keys: list[str] = []     # Step-scoped memory retrieval
    failure_signatures: list[str] = [] # Known failure patterns
    verifier_checks: list[str] = []    # Validation checks (Solver-Critic-Verifier)
    expanded: bool = False             # Tracks seed → expansion

These schemas enable:

  • Actionable Output: Commands and patches ready for execution
  • Incremental Planning: Seed plans expand only when blocked (as-needed decomposition)
  • Step-Level Learning: Failure signatures attach to specific steps for ReasoningBank
  • Multi-Agent Reasoning: Verifier checks support MapCoder's Solver-Critic-Verifier pattern

Code Smell Detection in Context Assembly

Neo scans the relevance-ranked file set during context assembly and surfaces known issues to the model under KNOWN ISSUES IN NEARBY CODE. Detectors are intentionally high-precision (false positives turn into prompt bloat that hurts more than it helps):

  • TODO / FIXME / HACK / XXX markers (any text file)
  • Python stubs: pass-only / ...-only / raise NotImplementedError
  • Python bare except: and swallowed exceptions (except ...: pass)
  • Hardcoded credentials matching well-known prefixed shapes (OpenAI sk-, AWS AKIA, GitHub ghp_, Slack xox*-)

Per-file cap of 8 + global cap of 20 findings keeps the prompt bounded. Magic numbers and generic high-entropy secret detection are intentionally out of scope — they'd add more noise than signal at this stage.

Smart File Selection

The context gatherer picks files using three signals:

  • ProjectIndex semantic boost: when .neo/index.json exists (run neo --index once per repo), per-project FAISS over tree-sitter chunks projects top-k chunk hits back to per-file boosts up to +1.0 cosine. Chunks embed symbols + imports + first ~600 chars of body, so prompt keywords match what a file is, not assertion strings inside tests. Test-file matches are demoted 0.4× unless the prompt mentions test/spec.
  • Tree-sitter symbol overlap: the parser extracts function/class names + imports from top candidates and adds up to +1.2 for substring matches against prompt tokens (length-3 floor).
  • EPISODE-history feedback loop: each Neo run stashes touched file paths as file:<rel> tags on EPISODE facts. On the next run, the gatherer queries for similar past prompts and gives those files up to +0.5 boost — past file selections measurably influence future ones.

A per-file chunk cap of 2 prevents large files from eating the budget; a one-time first-run hint fires if the index is missing.

Learning Feedback Loop

After each Neo run, the next invocation diffs your repo against the suggestions it made and classifies the result. All confidence deltas are modulated by ±arch_mod (∈ {−0.1, 0, +0.1}) from the architectural-quality snapshot — see Architectural Quality Feedback Loop below.

Outcome Trigger Effect
ACCEPTED Code-block overlap ≥ 0.8 (modern path) or unified-diff overlap > 0.3 (legacy path) linked fact conf +0.2 ± arch_mod, success_count +1, effectiveness "better"
MODIFIED User changed the file differently linked fact conf −0.2 ± arch_mod (floored at 0.1) + new REVIEW at conf 0.4
UNVERIFIED File touched but suggestion had no diff to compare linked fact conf +0.1 ± arch_mod, success_count +1 (no REVIEW)
INDEPENDENT File touched, never suggested by Neo new REVIEW at conf 0.2; capped 5/session, 50/project

Storage Architecture

  • Scoped JSON Files: Facts stored in ~/.neo/facts/ — separate files per scope (global, org, project), with inline embeddings (no separate FAISS index for memory).
  • Bi-Temporal Supersession: similar facts are soft-deleted by stamping event_time_end rather than dropped. Tombstones persist until purge_dead_facts runs on the next cold start.
  • Constraint Auto-Ingestion: CLAUDE.md and similar files are automatically scanned and ingested as CONSTRAINT facts.
  • Sessions & Metrics: ~/.neo/sessions/ holds session manifests + replay logs; ~/.neo/metrics.jsonl logs every retrieve / add_fact / lm_call / overseer_tick (disable with NEO_METRICS=off).
  • Project Index (separate system): Tree-sitter code indexing uses FAISS for per-repository semantic search in .neo/.

Performance

Neo improves over time as it learns from experience. Initial performance depends on available facts. Performance grows as the semantic memory builds up successful solutions, failure learnings, and architectural decisions.

Memory-Driven Reasoning Effort (gpt-5* models)

Neo monetizes its learning into inference cost. Each query's reasoning.effort parameter is sized from the strength of the memory hit:

Memory + difficulty Effort
≥3 patterns, avg confidence ≥ 0.8 low
Some patterns, avg confidence 0.5–0.8 medium (API default)
No relevant patterns OR avg confidence < 0.5 high
No patterns AND difficulty == "hard" xhigh

Familiar queries get cheap thinking; novel-and-hard queries get max thinking. Cap with NEO_REASONING_EFFORT={none,low,medium,high,xhigh} for cost control.

Model note: the effort vocabulary differs by model. gpt-5.5 (the default) accepts the full none / low / medium / high / xhigh range. Older gpt-5-codex only accepts low / medium / high — if you switch back to that model, set NEO_REASONING_EFFORT=high to cap the auto-selector.

Architectural Quality Feedback Loop

When a session ends, neo snapshots three structural metrics — import cycles, god files (LOC + function-count thresholds), and max nesting depth — and diffs against the previous snapshot at the next outcome detection. A regression weakens the accept/boost or strengthens the modify/penalty by 0.1; an improvement does the reverse. Confidence becomes a signal of "helped the codebase," not just "got accepted."

Configuration

CLI Configuration Management

Neo provides a simple CLI for managing persistent configuration:

# List all configuration values
neo --config list

# Get a specific value
neo --config get --config-key provider

# Set a value
neo --config set --config-key provider --config-value anthropic
neo --config set --config-key model --config-value claude-sonnet-4-5-20250929
neo --config set --config-key api_key --config-value sk-ant-...

# Reset to defaults
neo --config reset

Exposed Configuration Fields:

  • provider - LM provider (openai, anthropic, google, azure, ollama, local)
  • model - Model name (e.g., gpt-5.5, claude-sonnet-4-5-20250929)
  • api_key - API key for the chosen provider
  • base_url - Base URL for local/Ollama endpoints
  • memory_backend - Memory backend: "fact_store" (default) or "legacy"
  • auto_install_updates - Automatically install updates in background (true/false)
  • constraint_auto_scan - Auto-scan CLAUDE.md for constraints (true/false, default: true)
  • log_level - Logging level: DEBUG, INFO, WARNING, or ERROR
  • reasoning_effort_cap - Optional cap for OpenAI gpt-5 reasoning effort

Configuration is stored in ~/.neo/config.json. Environment variables override stored config values for the current process.

Secure API Key Storage

On macOS, Neo stores API keys in Keychain rather than config.json. Run:

# Securely prompt for and store an API key in Keychain
neo --config set --config-key api_key

NeoConfig.load() reads the Keychain entry automatically.

Linux / Windows: this command currently raises — Keychain support is macOS-only. Either set the provider env var directly (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.) or export NEO_ALLOW_PLAINTEXT_API_KEY=1 first so the key is persisted in config.json.

Environment Variables

Credentials

# Provider-specific (read by NeoConfig.load() when set)
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GOOGLE_API_KEY=...

# Neo-generic override (takes precedence over provider-specific keys)
export NEO_PROVIDER=openai
export NEO_MODEL=gpt-5.5
export NEO_API_KEY=sk-...
export NEO_BASE_URL=http://localhost:11434       # for Ollama/local endpoints

Behavior

export NEO_REASONING_EFFORT=high                  # cap auto-effort selection
export NEO_AUTO_INSTALL_UPDATES=1                 # auto-install background updates
export NEO_SKIP_UPDATE_CHECK=1                    # disable update checks entirely
export NEO_LOG_LEVEL=INFO                         # DEBUG/INFO/WARNING/ERROR
export NEO_TEMPERATURE=0.7                        # generation temperature
export NEO_MAX_TOKENS=4096                        # per-call max output tokens

Install Sanity

If you have multiple local installs, make sure the neo command and your test interpreter import the same package:

which neo
neo --version
python3 -c "import neo; print(neo.__file__)"

Observability

export NEO_METRICS=off                            # disable ~/.neo/metrics.jsonl writes

Neo writes structured per-operation events (retrieve / add_fact / lm_call / overseer_tick) to ~/.neo/metrics.jsonl and per-session manifests + JSONL outcome logs to ~/.neo/sessions/.

LM Adapters

OpenAI (Default)

from neo.adapters import OpenAIAdapter
adapter = OpenAIAdapter(model="gpt-5.5", api_key="sk-...")

Default model: gpt-5.5. GPT-5/Codex models use the /v1/responses endpoint automatically.

Anthropic

from neo.adapters import AnthropicAdapter
adapter = AnthropicAdapter(model="claude-sonnet-4-5-20250929")

Default model: claude-sonnet-4-5-20250929

Google

Note: Requires Python 3.10+ and google-genai>=0.2.0

from neo.adapters import GoogleAdapter
adapter = GoogleAdapter(model="gemini-2.0-flash")

Default model: gemini-2.0-flash. Uses the google-genai SDK.

Ollama

from neo.adapters import OllamaAdapter
adapter = OllamaAdapter(model="llama2")

CAR (Common Agent Runtime)

from neo.adapters import CarAdapter
# Default: router-picked model with intent_json={"task": "code"} so the
# router selects a code-capable backend rather than the chat default.
adapter = CarAdapter()
# pin a specific backend if you need to:
adapter = CarAdapter(model="Qwen3-4B")
# override the default intent (CAR task enum: chat | classify | reasoning | code):
adapter = CarAdapter(intent_hint={"task": "reasoning", "prefer_local": True})

Requires the [car] extra (pip install neo-reasoner[car]) and a running car-server. See Run as an Agent (CAR / A2A) for the full setup.

Extending Neo

Add a New LM Provider

from neo.cli import LMAdapter

class CustomAdapter(LMAdapter):
    def generate(self, messages, stop=None, max_tokens=4096, temperature=0.7):
        # Your implementation
        return response_text

    def name(self):
        return "custom/model-name"

Key Features

  • Three integration surfaces on equal footing:
    • Run as an Agent (CAR / A2A) — host Neo as an Agent2Agent v1.0 endpoint via neo serve; other agents call neo.process directly
    • Claude Code Plugin — six slash commands + a specialized agent inside Claude Code
    • Codex Plugin — the same six skills, packaged for OpenAI Codex CLI
  • Fact-Based Memory: Learns from every solution attempt using a scoped, supersession-based fact store
  • Semantic Retrieval: Vector search finds relevant facts via Jina Code embeddings
  • Code-First Generation: No diff parsing failures
  • Scoped Storage: Privacy-first JSON storage in ~/.neo/facts/ with global, org, and project scopes
  • Model-Agnostic: Works with any LM provider
  • The Construct: Curated library of architecture patterns with semantic search
  • Project Indexing: Tree-sitter based multi-language code indexing with FAISS
  • Prompt Enhancement: Analyze and improve prompt effectiveness

Development

Running Tests

# Run all tests
pytest

# Run specific test
pytest tests/test_neo.py

# Run with coverage
pytest --cov=neo

Research & References

The 0.18 memory architecture lands deterministic techniques from a focused reading of recent work on long-horizon agent memory and code generation. Citations below are anchored to the file where the technique is actually implemented — the full PDFs are checked into papers/ for reproducibility.

Academic Papers

Memory architecture & lifecycle

  1. SCM Sleep Memory: Sleep-Consolidation in Continual Memory Paper 2604.20943

    • 4-D ValueTagger composite (novelty, validation, task, repetition); adaptive forgetting threshold; NREM Hebbian strengthening + global downscale; triple-trigger consolidation gate.
    • Implementation: src/neo/memory/value_score.py, store.synthesize_reviews.
  2. Memory Systems Survey (1) Paper 2603.07670

    • Provenance taxonomy (STRUCTURAL > OBSERVED > INFERRED); dual-buffer / probation consolidation; Layer-1/2/3 observability split.
    • Implementation: src/neo/memory/models.py:42, store.py (probation tag), memory/metrics.py.
  3. Memory Survey 2 — Zep / AriGraph bi-temporal pattern Paper 2512.13564 §5.2.2

    • Bi-temporal stamps (event_time / event_time_end / ingest_time); supersession via soft-delete.
    • Implementation: src/neo/memory/models.py:241.
  4. Trajectory Memory — Canonical-signature dedup Paper 2603.10600 §7

    • Entity abstraction + verb-synonym folding + context strip as a pre-write dedup signature.
    • Implementation: src/neo/memory/generalize.py.
  5. Memori — Hybrid dense+BM25 retrieval Paper 2603.19935 §3.3

    • Sparse BM25 channel (k1=1.5, b=0.75) min-max-normalized and weighted with the dense channel at 0.7/0.3.
    • Implementation: src/neo/memory/bm25.py (sparse channel), store._fuse_dense_sparse (0.7/0.3 fusion).
  6. MemMachine — Query-shape routing & nucleus episode expansion Paper 2604.04853 §4.6, §5.3, §8.4.1

    • DIRECT / CHAIN / SPLIT prompt classification with per-branch retrieval; episode-peer expansion at retrieval time; k=20–30 sweet spot.
    • Implementation: src/neo/memory/query_routing.py, store.py nucleus expansion.
  7. LessonL — Effectiveness multiplier on reuse outcomes Paper 2505.23946

    • Per-fact c/n effectiveness as a success-bonus multiplier; half-by-rank / half-by-cosine slot allocation (Algorithm 1).
    • Implementation: src/neo/memory/models.py:130, 233; store.retrieve_relevant.
  8. Ebbinghaus Recall — Spaced-repetition decay for retrieval Hou et al., paper 2404.00573

    • Recall-probability transform p_n(t) = (1 − exp(−r·exp(−t/g_n))) / (1 − e⁻¹) applied to similarity scores for fluid facts.
    • Implementation: src/neo/math_utils.py:40, models.rank_score.
  9. Episodic Memory — Five-property episodic context Paper 2502.06975 Table 1

    • {when, where, why, with_whom} instance-specific event context.
    • Implementation: src/neo/memory/models.py:320 EpisodeContext.
  10. Multiple Memory Systems — Retrieval / context unit split Paper 2508.15294 §3

    • Embed concise keywords (retrieval_text); inject full narrative (context_text) — same fact, two surfaces.
    • Implementation: src/neo/memory/models.py:373.

Engine & multi-agent reasoning

  1. MapCoder — Solver–Critic–Verifier multi-agent collaboration Islam et al., paper 2405.11403 | GitHub

    • Per-step confidence, multi-plan iteration scaffolding.
    • Implementation: PlanStep.confidence in src/neo/models.py.
  2. CodeSim — MODIFY / NO_MODIFY decision token Hou et al., paper 2502.05664

    • Simulator emits an explicit "no modification needed" token; planner uses it as an override on the agreement-of-outputs heuristic. (Distinct from the 2023 ACM CodeSim paper of the same name.)
    • Implementation: src/neo/engine.py:427.
  3. SICA — Asynchronous structured-output watchdog & cache-hit observability Paper 2504.15228 §A.2, Table 1

    • Daemon-thread tick loop emitting overseer_tick events; loop detection via 5-identical-actions-in-a-row; LM-call cache-hit-rate tracking.
    • Implementation: src/neo/overseer.py, src/neo/adapters.py:237.

In-house papers (Parslee) — the foundational research behind Neo's context architecture

  • Beyond Conversation: A State-Based Context Architecture for Enterprise AI Agents Liotta, 2025 | PDF

    • The theoretical foundations for the four-layer state model (constraints / valid facts / invalidated facts / known unknowns), supersession semantics, and the six classes of state failure (resurrection, hallucination, scope leak, stale reasoning, authority violation, temporal decay).
    • Implementation: src/neo/memory/context.py, ContextResult in src/neo/memory/models.py.
  • Memgine: A Deterministic Memory Engine for Stateful AI Agents Liotta, 2026 | PDF

    • The production engine implementing the full specification: query-relevance sorting, engine-level access control, adaptive inline repair, layered token-budget enforcement (2/3 constraint cap, greedy first-fit accumulation with at_least_one, Fact.size_hint() heuristic). Achieves 95.8% decision accuracy on the StateBench v1.0 development split with GPT-5.2 (97.3% with Opus 4.6).
    • Implementation: _accumulate_within_budget in src/neo/memory/context.py; Fact.size_hint() in src/neo/memory/models.py; design notes in docs/solutions/token-budget-enforcement.md.
  • StateBenchgithub.com/parslee-ai/statebench · parslee-ai.github.io/statebench

    • The conformance test suite that evaluates the two papers above. PyPI / HuggingFace Dataset / Space. Reference baselines (state_based, rolling_summary, fact_extraction_with_supersession, etc.) on the v1.0 test split set the bar Neo's port is measured against.

Background reading (in papers/ but not directly cited in code)

The following papers shaped the design vocabulary but aren't wired into a specific implementation today: 2506.18902 (Jina v4 — Neo currently uses Jina v2), 2508.21290 (Jina Code Embeddings), 2509.17489 (MapCoder-Lite), 2511.20857 (Evo-Memory).

Historical influences (cited in legacy modules under deprecation): ReasoningBank (2509.25140) informed the original src/neo/persistent_reasoning.py; the 0.18 fact store supersedes it.

Technologies & Libraries

Embedding & Search:

  • Jina Embeddings v2 (Code) HuggingFace | GitHub

    • 768-dimensional embeddings optimized for code similarity
    • Local inference (no API calls)
    • Used in: Neo's semantic memory and pattern retrieval
  • FAISS (Facebook AI Similarity Search) GitHub | Docs

    • Efficient vector similarity search and clustering
    • Billion-scale index support
    • Used in: Neo's fast pattern matching (<13ms avg)
  • FastEmbed GitHub | Docs

    • Lightweight local embedding generation
    • ONNX Runtime backend
    • Used in: Neo's local embedding pipeline

Datasets (for Load Program):

  • MBPP (Mostly Basic Programming Problems) HuggingFace | Paper

    • 1,000 crowd-sourced Python programming problems
    • Used for: Bootstrapping Neo's semantic memory
  • HumanEval HuggingFace | Paper

    • 164 hand-written programming problems
    • Used for: Quality pattern seeding

Citation

If you use Neo in academic research, please cite:

@software{neo2025,
  title={Neo: Self-Improving Code Reasoning Engine with Persistent Semantic Memory},
  author={Parslee AI},
  year={2025},
  url={https://github.com/Parslee-ai/neo},
  note={Memory architecture draws on SCM Sleep Memory (2604.20943), MemMachine (2604.04853), LessonL (2505.23946), and the bi-temporal/Ebbinghaus/dual-buffer techniques cataloged in the README's Research \& References section}
}

License

Apache License 2.0 - See LICENSE for details.

Contributing

See CONTRIBUTING.md for contribution guidelines.

Changelog

See CHANGELOG.md for version history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neo_reasoner-0.21.0.tar.gz (615.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neo_reasoner-0.21.0-py3-none-any.whl (379.5 kB view details)

Uploaded Python 3

File details

Details for the file neo_reasoner-0.21.0.tar.gz.

File metadata

  • Download URL: neo_reasoner-0.21.0.tar.gz
  • Upload date:
  • Size: 615.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neo_reasoner-0.21.0.tar.gz
Algorithm Hash digest
SHA256 8bf295599c42b42f65ab8df01ffc5fbb0e48a824cbcc5d2e019b607ed6adf1d6
MD5 cfbd69afdd86ab53735e2017b2e69fa3
BLAKE2b-256 6f75db71de860008c31a4206ea9a089ff4613f6151d7adc0598a856a39be519a

See more details on using hashes here.

Provenance

The following attestation bundles were made for neo_reasoner-0.21.0.tar.gz:

Publisher: publish.yml on Parslee-ai/neo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file neo_reasoner-0.21.0-py3-none-any.whl.

File metadata

  • Download URL: neo_reasoner-0.21.0-py3-none-any.whl
  • Upload date:
  • Size: 379.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neo_reasoner-0.21.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2c60e8829638993ace3f465575cdd725b7685840169366e898e93c8cfda29678
MD5 dc844d4d6d4329cf7a75792f2eecd545
BLAKE2b-256 4d6a50b17eac398eff852e0b3d87448047331af6ebaaea9b6578befb40c9d2d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for neo_reasoner-0.21.0-py3-none-any.whl:

Publisher: publish.yml on Parslee-ai/neo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page