自进化 AI 员工框架 — 越用越聪明

These details have not been verified by PyPI

Project description

Leaper Agent

Self-Evolving AI Agent Framework — An AI colleague that actually learns from experience.

Most AI agents are stateless. Every conversation starts from zero. Leaper is different: it extracts structured experiences from every interaction, clusters them into reusable skills, builds a mental model of you, and validates its own knowledge for consistency. Six layers of evolution, running continuously.

Real Memory, Not a Notepad	Every conversation is analyzed along four dimensions (task/strategy/outcome/insight), filtered through 4-gate quality control, deduplicated at cosine > 0.85, and stored in a local SQLite database. BM25 + vector hybrid recall with RRF fusion — not "dump everything into context".
Gets Smarter Over Time	L1 extracts experiences → L2 clusters into reusable skills → L3 merges/deprecates/promotes across skills → L4 builds a user mental model → L5 validates consistency + runs regression checks. Each layer has quality gates and measurable metrics.
Local-First LLM Strategy	High-frequency operations (embedding, extraction, validation) run on local Ollama models. Only infrequent reasoning tasks (skill synthesis, user modeling) call cloud LLMs. Deterministic operations use rules, not LLM.
Product-Grade Install	`pip install leaper-agent && leaper init && leaper run` — interactive wizard, no manual config editing. Pure Python, zero native compilation dependencies.
15+ Platforms Out of the Box	Telegram · Discord · Slack · WhatsApp · Signal · Feishu · DingTalk · Matrix · Email · Home Assistant · API · CLI — single gateway process connects all platforms.
Zero-Config Search	DuckDuckGo fallback needs no API key. Add Firecrawl / Tavily keys for deep search.
Template System	`leaper init --template ceo-coach` creates a professional AI employee in one command. CEO Coach template: Socratic coaching + 40 business frameworks + six-layer memory.

DeepBrain: Six-Layer Self-Evolution Engine

This is what makes Leaper fundamentally different from every other agent framework.

Why Not RAG?

Existing agent "memory" falls into two patterns:

Full-context injection — Write everything to MEMORY.md, stuff it all into the system prompt every turn. Problem: context window grows linearly, costs become uncontrollable after 50 conversations.
RAG vector retrieval — Embed memories, retrieve Top-K each turn. Problem: pure vector search is poor at exact term matching ("MST state machine" won't match "Mission State Transition"), and cannot handle knowledge decay or conflicts.

DeepBrain's approach: BM25 exact matching + vector semantic search, RRF fusion ranking, six-layer evolution loop. Not RAG — a cognitive evolution system.

Data Flow

User message → Core engine processes → Agent replies
                                          ↓
                                    sync_turn() triggers
                                          ↓
                            ┌─────────────────────────────┐
                            │  L1 Experience Extract       │
                            │  4D analysis → 4Gate → store │
                            └──────────────┬───────────────┘
                                          ↓ (5+ entries accumulated)
                            ┌─────────────────────────────┐
                            │  L2 Skill Generate           │
                            │  cluster → synthesize → test │
                            └──────────────┬───────────────┘
                                          ↓ (periodic)
                            ┌─────────────────────────────┐
                            │  L3 Cross-Skill Evolution    │
                            │  MERGE / DEPRECATE / PROMOTE │
                            └──────────────┬───────────────┘
                                          ↓ (sufficient data)
                            ┌─────────────────────────────┐
                            │  L4 User Model               │
                            │  Multi-dimensional profiling │
                            └──────────────┬───────────────┘
                                          ↓ (after every write)
                            ┌─────────────────────────────┐
                            │  L5 Adversarial Validation   │
                            │  consistency + regression    │
                            │  + decay                     │
                            └─────────────────────────────┘

Recall path (independent of evolution):
User message → L0 Hybrid Recall → RRF fusion → Top-K injected into context

L0 — Hybrid Recall

Problem: Pure vector search is bad at exact term matching. Pure keyword search can't understand that "evolution" and "gets smarter" mean the same thing.

Solution: BM25 + vector search, RRF fusion.

RRF_score(d) = Σ 1 / (k + rank_i(d))    where k = 60

Two rankers:

BM25: SQLite keyword search — exact matches for terms, abbreviations, codenames
768-dim vectors: nomic-embed-text (274MB) local embedding — semantic understanding

Why RRF over weighted average? RRF is scale-invariant. BM25 scores range 0–25, cosine scores range 0–1. Weighted averaging requires normalization with hard-to-tune parameters. RRF only looks at rank positions — automatically handles quality differences between rankers.

Graceful degradation:

Ollama + nomic-embed-text available → full RRF
Ollama without embedding model → BM25 only
No Ollama → SQLite LIKE keyword search

def hybrid_recall(self, query: str, top_k: int = 10) -> list[dict]:
    bm25_results = self._bm25_search(query, top_k=50)
    vector_results = self._vector_search(query, top_k=50)
    return self._rrf_fuse(bm25_results, vector_results, top_k=top_k)

L1 — Experience Extract

Problem: Most agents let the LLM freely decide what to remember. Result: the same thing stored 5 times, trivial info taking up 40%+ of memory, no quality filtering.

Solution: Structured 4-dimensional extraction + 4-gate quality control.

Four Dimensions

After each conversation turn, the LLM returns structured JSON:

{
  "task": "What the user asked for",
  "strategy": "What approach was chosen",
  "outcome": "What happened, success or failure",
  "insight": "Reusable takeaway from this interaction"
}

Based on Kolb's Experiential Learning Cycle: concrete experience (task) → reflective observation (strategy) → abstract conceptualization (insight) → active experimentation (outcome).

4-Gate Quality Control

def _should_store(self, experience: dict) -> bool:
    # Gate 1: Task must succeed
    if not experience.get("task_success"):
        return False
    
    # Gate 2: Complexity filter
    complexity = self._estimate_complexity(experience)
    if complexity == "trivial":  # < 30 chars
        return False  # "hello", "thanks", "ok" not worth storing
    
    # Gate 3: Deduplication
    similar = self.brain.hybrid_recall(experience["task"], top_k=3)
    if similar and similar[0]["score"] > 0.85:
        return False
    
    # Gate 4: Completeness — all four dimensions must have substance
    for field in ["task", "strategy", "outcome", "insight"]:
        if not experience.get(field) or len(experience[field]) < 10:
            return False
    
    return True

Complexity detection: Character count, not LLM judgment. In practice, LLMs classify almost everything as "moderate" in Chinese. < 30 chars = trivial, 30–200 = moderate, ≥ 200 = complex.

Measured: ~27% of conversations filtered by 4Gate, 73% effective storage rate.

L2 — Skill Generate

Problem: Experiences are one-off ("analyzed Dify competitor that time"). Skills are reusable ("how to do competitor analysis"). After 20 uses, the agent should have synthesized its own methodology.

Solution: Cluster similar experiences → LLM synthesizes skill → backtesting validation.

Clustering: Greedy cosine-similarity clustering, threshold 0.7, minimum 5 experiences per cluster
Synthesis: LLM receives a cluster, outputs a skill definition with title, content, applicable scenarios, confidence score
Quality gate: title > 5 chars, content > 50 chars
Backtesting: Apply the new skill to historical questions, compare quality against original answers

Why 0.7 threshold? 0.8 is too strict — variations in phrasing split the same topic into tiny clusters. 0.6 is too loose — unrelated experiences get grouped together. 0.7 is empirically tuned on CEO Coach conversations.

L3 — Cross-Skill Evolution

Problem: Skills accumulate redundancy ("competitor analysis v1" and "competitor analysis v2" say the same thing) and become stale.

Operation	Trigger	Action
MERGE	Two skills cosine > 0.8	Merge, keep higher-confidence structure
DEPRECATE	access_count = 0 and confidence < 0.5	Mark deprecated, lower recall weight
PROMOTE	access_count > 10 and confidence > 0.8	Mark core skill, boost recall weight

Plus drift detection: if a skill hasn't been used in 90+ days and new experiences contradict it, trigger review.

L3 uses zero LLM calls. MERGE/DEPRECATE/PROMOTE are deterministic operations based on cosine similarity and access counts. This is core to the tiered design: deterministic operations use rules; only fuzzy reasoning uses LLM.

L4 — User Model

Problem: The agent doesn't know you. The same question should get completely different answers for a CEO vs. an engineer.

Solution: Automatically build a multi-dimensional user profile from conversations.

{
  "communication_style": "Direct, data-driven, dislikes fluff",
  "decision_patterns": "Data first, then judgment. Prefers MVP validation over perfect plans",
  "recurring_topics": ["AI trends", "competitor analysis", "product architecture"],
  "expertise_level": "expert",
  "confidence": 0.78
}

Strict validation: 4 required fields, minimum length checks (no single-word profiles like "friendly"), topic list must have ≥ 2 items, expertise must be an enum value, confidence must be 0–1.

Update strategy: Incremental merge, not overwrite. New observations fuse with existing profile, confidence updated via Bayesian update. One conversation won't overturn the entire user model.

L5 — Adversarial Validation

The immune system for the memory engine. Triggers automatically after every write:

Consistency check: New entry vs. existing knowledge base. Contradictions are flagged, not silently overwritten.
Regression protection: After skill updates, backtest against top-5 historical questions. If quality drops, rollback.
Decay: Linear decay, access_count -= 1 every 7 days for untouched entries. An entry with access_count = 10 reaches zero in 70 days — predictable, unlike exponential decay that never reaches zero.

L5 is 100% rule-based. Zero LLM calls. The validation layer cannot depend on LLM — otherwise LLM hallucinations would infect validation results.

LLM Tiered Degradation

Layer	Operation	Compute	Model	Frequency
L0	Vector generation	Local	nomic-embed-text (274MB)	Every turn
L0	BM25 search	Local	SQLite built-in	Every turn
L1	Experience extraction	Local	qwen2.5:7b (4.7GB)	Every turn
L1	Quality gating	Rules	—	Every turn
L2	Clustering	Local	Cosine similarity	Per 5+ experiences
L2	Skill synthesis	Cloud	Primary model	Infrequent
L3	MERGE/DEPRECATE/PROMOTE	Rules	—	Periodic
L4	Profile building	Cloud	Primary model	Infrequent
L5	Consistency/regression/decay	Rules	—	Every write

High-frequency = local/rules, zero API cost. Low-frequency = cloud LLM.

Auto-detection at startup: Ollama available → local-first. No Ollama → everything goes through cloud.

LEAPER_LOCAL_URL=http://localhost:11434/v1   # Custom Ollama address
LEAPER_LOCAL_MODEL=qwen2.5:14b               # Use a larger local model

Platform Connectivity (15+ Channels)

Single gateway process connects all platforms. Conversations persist across platforms.

Platform	Protocol	Config
Telegram	Bot API	`TELEGRAM_BOT_TOKEN`
Discord	discord.js	`DISCORD_BOT_TOKEN`
Slack	Bolt SDK	`SLACK_BOT_TOKEN` + `SLACK_APP_TOKEN`
WhatsApp	Baileys (no Business API needed)	`config.yaml`
Signal	signal-cli	`config.yaml`
Feishu	Open API	`config.yaml`
DingTalk	Stream SDK	`config.yaml`
Matrix	matrix-nio	`config.yaml`
Email	IMAP/SMTP	`config.yaml`
LINE	Messaging API	`config.yaml`
Mattermost	WebSocket	`config.yaml`
IRC	irc-framework	`config.yaml`
Home Assistant	REST API	`config.yaml`
API	HTTP/WebSocket	Built-in
CLI	Local terminal	`leaper chat`

Security: DM pairing codes for unknown senders, user whitelists, group ID whitelists, mention-only mode, tool execution approval.

40+ Built-in Tools

Terminal & Files

terminal · file_read · file_write · file_edit — shell execution with timeout, atomic writes, surgical edits.

Search & Web

web_search · web_fetch · web_scrape — degradation chain: Firecrawl → Tavily → DuckDuckGo. Zero-config.

Code & Dev

code_execute · git · github — sandboxed execution, full Git/GitHub API.

Media

image_generation · tts · vision · pdf_reader — fal.ai / DALL-E / ElevenLabs / multimodal analysis.

Automation

cron · delegation · webhook — scheduled tasks, parallel sub-agents, HTTP callbacks.

All tools support MCP protocol for extending with third-party tool servers.

TUI Terminal Interface

Full terminal UI, not just readline:

Multi-line editing for code and long text
Slash command auto-completion
Conversation history with cross-session persistence
Streaming tool call output
Ctrl+C interrupt and redirect
/new · /model · /compress session management

leaper chat                    # Start terminal conversation
leaper chat --model gpt-4o     # Specify model

Cron Scheduling

Built-in scheduler with cron expressions. Tasks run in isolated sessions, results delivered to any connected platform.

cron:
  daily-report:
    schedule: "0 8 * * *"
    task: "Generate today's action items summary"
    deliver_to: telegram

Sub-Agent Delegation

Complex tasks can be split across parallel sub-agents, each with independent context windows:

User: "Analyze these 5 competitors"
         ↓
Main Agent → Spawns 5 sub-agents (parallel)
         ↓
Main Agent ← Aggregates 5 reports → User

Workspace Files

Agent persona, memory, and rules are defined through Markdown files — no code needed:

File	Purpose	Loading
`EGO.md`	Core rules and behavioral boundaries	Always loaded
`SOUL.md`	Values, communication style, expertise	Always loaded
`IDENTITY.md`	One-line identity	Always loaded
`USER.md`	User profile	Always loaded
`MEMORY.md`	Persistent memory	Seed once, then recall on demand
`AGENTS.md`	Multi-agent collaboration rules	Always loaded

MEMORY.md uses seed-and-recall: loaded fully on first conversation, then only relevant sections recalled via L0 Hybrid Recall. This solves the "more memory = more expensive context" problem.

Model Support

Provider	Examples	Config
OpenAI	GPT-4o, GPT-4.1, o3	`OPENAI_API_KEY`
Anthropic	Claude Opus 4, Sonnet 4	`ANTHROPIC_API_KEY`
OpenRouter	200+ models	`OPENROUTER_API_KEY`
Ollama	qwen2.5, llama3, deepseek-r1	Local, zero config
Custom	Any OpenAI-compatible API	`base_url` + `api_key`

Failover: automatic fallback when primary model is unavailable.

Prerequisites

Before installing, make sure you have:

Tool	Version	Download	Notes
Python	≥ 3.10	python.org/downloads	⚠️ Windows: check "Add python.exe to PATH" during install
Git	any	git-scm.com/download	Windows: default options, Next all the way
Ollama (optional)	any	ollama.com/download	For local models. Not required if using cloud API

💡 After installing Python and Git on Windows, close and reopen PowerShell for PATH changes to take effect.

Verify:

python --version   # Should show 3.10+
git --version      # Should show any version

Quick Install

pip install leaper-agent

No C++ compiler needed. No GPU required.

leaper init                          # Interactive wizard
leaper init --template ceo-coach     # Create from template
leaper chat                          # Terminal conversation
leaper run                           # Start gateway (Telegram, etc.)
leaper workshop                      # Browse available templates

Configuration

# leaper.yaml
name: 'CEO Coach'

model:
  provider: openai
  name: gpt-4o

channel:
  type: telegram

brain:
  enabled: true
  localModel: auto    # auto = detect Ollama | off = cloud only

Template System

leaper workshop                      # List templates
leaper init --template ceo-coach     # Install template

Templates are pre-configured file sets (YAML + Markdown), not hardcoded logic. Fork any template and modify freely.

CEO Coach

Socratic startup coach designed for CEOs and founders:

Coaching philosophy: Ask before answering, never decide for the user
40 business frameworks: Porter's Five Forces, SWOT, Flywheel, TAM/SAM/SOM, Jobs-to-be-Done...
Six-layer memory: Remembers your strategic preferences, past decisions, recurring concerns
Strict behavioral boundaries: No internal tool exposure, no technical details leaked, no error messages shown to users

DB Schema

brain.db — local SQLite, typically < 10MB for 1000+ entries.

CREATE TABLE pages (
    slug TEXT PRIMARY KEY,
    title TEXT,
    namespace TEXT,          -- agent/{name} | desk/{name}/{seat} | role/{role} | org
    content TEXT,
    entry_type TEXT,         -- experience | skill | user_model | meta
    confidence REAL,         -- 0.0 - 1.0
    access_count INTEGER,    -- used for L5 decay
    last_accessed TEXT,
    metadata TEXT,           -- JSON extension field
    updated_at TEXT
);

CREATE TABLE chunks (
    id TEXT PRIMARY KEY,
    page_slug TEXT REFERENCES pages(slug),
    content TEXT             -- chunked content for vector search
);

Environment Variables

Variable	Purpose	Default
`LEAPER_HOME`	Working directory	`~/.leaper`
`LEAPER_LOCAL_URL`	Local Ollama address	`http://localhost:11434/v1`
`LEAPER_LOCAL_MODEL`	Local inference model	`qwen2.5:7b`
`TELEGRAM_BOT_TOKEN`	Telegram Bot Token	—
`OPENAI_API_KEY`	OpenAI API Key	—
`ANTHROPIC_API_KEY`	Anthropic API Key	—

Development

git clone https://github.com/Deepleaper/leaper-agent.git
cd leaper-agent
python -m venv venv && source venv/bin/activate
pip install -e ".[all,dev]"

DeepBrain Code Structure (~1951 lines)

agent/
  leaper_brain.py           # 564 lines — L0 hybrid recall, DB operations
  leaper_evolution.py       # 992 lines — L1-L5 evolution logic
  leaper_orchestrator.py    # 137 lines — evolution scheduler
  leaper_seed_loader.py     #           — workspace file loading + OS injection

plugins/memory/deepbrain/
  provider.py               # 258 lines — Memory Provider interface
  plugin.yaml

Extending the Memory Engine

Implement the MemoryProvider base class:

Create a directory under plugins/memory/
Implement sync_turn() / recall() / store()
Declare in plugin.yaml
Set memory.provider: your-provider in config

License

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.0.0

May 1, 2026

1.0.2

Apr 30, 2026

This version

1.0.1

Apr 29, 2026

0.9.2

Apr 28, 2026

0.9.1

Apr 28, 2026

0.9.0

Apr 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

leaper_agent-1.0.1.tar.gz (5.9 MB view details)

Uploaded Apr 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

leaper_agent-1.0.1-py3-none-any.whl (2.8 MB view details)

Uploaded Apr 29, 2026 Python 3

File details

Details for the file leaper_agent-1.0.1.tar.gz.

File metadata

Download URL: leaper_agent-1.0.1.tar.gz
Upload date: Apr 29, 2026
Size: 5.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for leaper_agent-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`fc1a70d33c7058251dec0a7889374fb5e0fad33ad6ef9dc1a715485317b443d6`
MD5	`c07c252dea962de9fb4eb69d7be65b8c`
BLAKE2b-256	`c8a2ac67f19068de4827c5a8a6fa6ab33172d980811d10d6baaf2d34a9e76d80`

See more details on using hashes here.

File details

Details for the file leaper_agent-1.0.1-py3-none-any.whl.

File metadata

Download URL: leaper_agent-1.0.1-py3-none-any.whl
Upload date: Apr 29, 2026
Size: 2.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for leaper_agent-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b2320f0de9c4b7cf26cf106b6236f3062b1dc7033f909955c74cc370c3975c82`
MD5	`3916085ebfc74ab73807923baf580edf`
BLAKE2b-256	`146857a3c853cac81569ca112d0d0b7b5c4a42d920e14df8639f263fd528b9fa`

See more details on using hashes here.

leaper-agent 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Leaper Agent

DeepBrain: Six-Layer Self-Evolution Engine

Why Not RAG?

Data Flow

L0 — Hybrid Recall

L1 — Experience Extract

Four Dimensions

4-Gate Quality Control

L2 — Skill Generate

L3 — Cross-Skill Evolution

L4 — User Model

L5 — Adversarial Validation

LLM Tiered Degradation

Platform Connectivity (15+ Channels)

40+ Built-in Tools

Terminal & Files

Search & Web

Code & Dev

Media

Automation

TUI Terminal Interface

Cron Scheduling

Sub-Agent Delegation

Workspace Files

Model Support

Prerequisites

Quick Install

Configuration

Template System

CEO Coach

DB Schema

Environment Variables

Development

DeepBrain Code Structure (~1951 lines)

Extending the Memory Engine

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes