Knowledge governance plugin for OpenViking — coverage assessment, external search, review, ingest, dedup, and feedback-driven ranking.

These details have not been verified by PyPI

Project links

Project description

OpenViking Curator

English / 中文

Knowledge governance plugin for OpenViking. Curator sits on top of your OV knowledge base — it decides when local knowledge is enough, when to search externally, reviews what comes back, and ingests the good stuff. Your knowledge base grows with every question.

How it works

flowchart TD
    Q[Query] --> R[Route]
    R --> OV[Retrieve from OV]
    OV --> L[Load L0 → L1 → L2]
    L --> COV{Coverage OK?}
    COV -- yes --> OUT1[Return local context]
    COV -- no --> EXT[External search]
    EXT --> F{Need fresh?}
    F -- yes --> CV[Cross-validate]
    F -- no --> J
    CV --> J[Judge + conflict]
    J --> P{Pass?}
    P -- no --> OUT1
    P -- yes --> C{Conflict?}
    C -- blocked --> OUT1
    C -- ok --> ING[Ingest + verify]
    ING --> OUT2[Return merged context]

LLM call strategy:

Coverage sufficient → 0 LLM calls, return immediately
External search triggered → 1 LLM call (judge + conflict combined)
Need freshness validation → 2 LLM calls (+ cross-validate)

Features

Feature	Description	Module
Rule-based routing	Domain, keywords, freshness detection. No LLM needed. JSON-configurable.	`router.py`
Dual-path retrieval	`find` (vector) + `search` (LLM intent). URI dedup.	`retrieval_v2.py`
On-demand loading	L0 (abstract) → L1 (overview) → L2 (full). Only deeper when needed.	`retrieval_v2.py`
Coverage assessment	Score gap + keyword overlap signals. 0 LLM calls.	`retrieval_v2.py`
External search	Pluggable providers: Grok, DuckDuckGo, Tavily. Fallback chain or concurrent.	`search_providers.py`
Domain filtering	Whitelist / blacklist external search results by domain.	`domain_filter.py`
Cross-validation	When `need_fresh=true`. Flags risky/outdated claims.	`search.py`
Judge + conflict	Single LLM call: trust 0-10, freshness, pass/fail, contradiction. Pydantic validated.	`review.py`
Conflict resolution	Configurable: `auto` / `local` / `external` / `human`. Bidirectional scoring.	`pipeline_v2.py`
Ingest	Writes to OV with metadata (source URLs, version, TTL, quality feedback).	`review.py`
Async ingest	Fire-and-forget background thread. Observable via async job tracker.	`pipeline_v2.py` + `async_jobs.py`
Auto-summarize	Generates L0/L1 summaries on ingest for new resources.	`pipeline_v2.py`
Dedup scanning	URL hash (exact match) + Jaccard word similarity. Reports only.	`dedup.py`
Freshness scoring	URI timestamp → decay score. Configurable thresholds.	`freshness.py`
Usage-based TTL	Hot / warm / cold tiers. Frequently used → longer TTL.	`usage_ttl.py`
Feedback reranking	`up`/`down`/`adopt` per URI. Score-weighted, rank-aware. OV score stays dominant.	`feedback_store.py`
Decision report	ASCII box, single-line, JSON, HTML export. Always present in `run()` output.	`decision_report.py`
Session tracking	Records queries + used URIs. Commits to extract long-term memory.	`session_manager.py`
Query logging	Every query → `query_log.jsonl` with coverage, reasons, LLM calls.	`pipeline_v2.py`
Circuit breaker	3-state breaker wrapping LLM + search calls. Auto-recovery.	`circuit_breaker.py`
Search cache	LRU + dual TTL. File-locked JSON persistence.	`search_cache.py`
Automated governance	Weekly cycle: audit, flag, proactive search, report. Hybrid async. No auto-deletion.	`governance.py`
Interest analysis	Extract user interests from query log + feedback. Generate proactive search queries.	`interest_analyzer.py`
Background scheduler	APScheduler: periodic freshness scan + weak topic strengthening + governance.	`scheduler.py`
Structured logging	structlog with JSON mode. Per-run context binding (run_id, query).	`logging_setup.py`

What Curator does NOT do

Vector search / indexing → OpenViking handles this
Answer generation → your LLM; Curator returns structured context, not answers

Quick Start

Prerequisites

Python 3.10+
A working OpenViking setup (embedded or HTTP mode)
An OpenAI-compatible API endpoint (for LLM review)
A search API (Grok recommended, or DuckDuckGo/Tavily)

Install

git clone https://github.com/ponsde/OpenViking_Curator.git
cd OpenViking_Curator

# Recommended: uv (fast, reproducible)
uv sync
source .venv/bin/activate

# Alternative: pip
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

cp .env.example .env   # then fill in your keys

Configure

Edit .env:

# OpenViking config (embedded mode)
OPENVIKING_CONFIG_FILE=/path/to/your/ov.conf

# LLM for review & routing (any OpenAI-compatible endpoint)
CURATOR_OAI_BASE=https://your-llm-api.com/v1
CURATOR_OAI_KEY=sk-your-key

# External search (Grok recommended)
CURATOR_GROK_BASE=https://your-grok-endpoint/v1
CURATOR_GROK_KEY=your-grok-key

Three endpoints, three keys. That's it.

Run

python3 curator_query.py --status                         # Health check
python3 curator_query.py "How to deploy Redis in Docker?" # Query
python3 curator_query.py --review "sensitive topic"       # Review mode (no auto-ingest)
python3 mcp_server.py                                     # MCP server (stdio JSON-RPC)

Docker

# Embedded mode: OV runs in-process
cp ov.conf.example ov.conf   # fill in your embedding API keys
cp .env.example .env         # fill in LLM + search keys
docker compose build
docker compose run --rm curator curator_query.py --status
docker compose run --rm curator curator_query.py "your question"

# HTTP mode: OV runs as a separate service
cp ov.conf.example ov.conf   # keep as-is (not read in HTTP mode)
echo "OV_BASE_URL=http://your-ov-host:8080" >> .env
docker compose build
docker compose run --rm curator curator_query.py --status

Python API

from curator.pipeline_v2 import run

result = run("Nginx reverse proxy with SSL?")
print(result["context_text"])         # local context
print(result["external_text"])        # external (if any)
print(result["coverage"])             # 0.0 ~ 1.0
print(result["meta"]["ingested"])     # True if new content stored
print(result["conflict"])             # conflict detection
print(result["decision_report"])      # ASCII decision report

# Reusable pipeline instance (shares session + backend)
from curator.pipeline_v2 import CuratorPipeline
pipeline = CuratorPipeline()
r1 = pipeline.run("how to deploy?")
r2 = pipeline.run("what is RAG?")

# Decision report in other formats
from curator.decision_report import format_report_json, format_report_html
print(format_report_json(result))
print(format_report_html(result))

# Feedback (boosts retrieval ranking next time)
from curator.feedback_store import apply
apply("viking://resources/doc-id", "up")    # mark helpful
apply("viking://resources/doc-id", "down")  # mark unhelpful

Configuration

All via .env (git-ignored). See .env.example for a full template.

Core (required)

Variable	Description
`OPENVIKING_CONFIG_FILE`	Path to `ov.conf` (embedded mode)
`CURATOR_OAI_BASE`	OpenAI-compatible API base URL
`CURATOR_OAI_KEY`	API key for above
`CURATOR_GROK_KEY`	Grok API key (external search)

OV mode

Variable	Default	Description
`OV_BASE_URL`	(empty)	Set to use HTTP mode. Empty = embedded mode.
`OV_DATA_PATH`	`./data`	OV data directory (embedded mode)

Search

Variable	Default	Description
`CURATOR_SEARCH_PROVIDERS`	`grok`	Comma-separated: `grok,duckduckgo,tavily` (fallback chain)
`CURATOR_SEARCH_CONCURRENT`	`0`	`1` = fire all providers in parallel
`CURATOR_SEARCH_TIMEOUT`	`60`	Global search timeout (seconds)
`CURATOR_TAVILY_KEY`	(empty)	Tavily API key (if using tavily provider)
`CURATOR_ALLOWED_DOMAINS`	(empty)	Whitelist (comma-separated)
`CURATOR_BLOCKED_DOMAINS`	(empty)	Blacklist (comma-separated)

Thresholds

Variable	Default	Effect
`CURATOR_THRESHOLD_COV_SUFFICIENT`	`0.55`	Above = skip external search
`CURATOR_THRESHOLD_COV_MARGINAL`	`0.45`	Above = marginal (still searches)
`CURATOR_THRESHOLD_COV_LOW`	`0.35`	Below = definitely search
`CURATOR_THRESHOLD_L0_SUFFICIENT`	`0.62`	L0 score to skip L1
`CURATOR_THRESHOLD_L1_SUFFICIENT`	`0.50`	L1 score to skip L2
`CURATOR_MAX_L2_DEPTH`	`2`	Max full-text reads per run

Background scheduler

Variable	Default	Description
`CURATOR_SCHEDULER_ENABLED`	`0`	`1` to activate background jobs
`CURATOR_FRESHNESS_INTERVAL_HOURS`	`24`	Freshness scan interval
`CURATOR_STRENGTHEN_INTERVAL_HOURS`	`168`	Weak topic strengthening interval (7 days)
`CURATOR_STRENGTHEN_TOP_N`	`3`	Number of weak topics per run

Governance

Variable	Default	Description
`CURATOR_GOVERNANCE_ENABLED`	`0`	`1` to enable governance cycle
`CURATOR_GOVERNANCE_INTERVAL_HOURS`	`168`	Cycle interval (default 7 days)
`CURATOR_GOVERNANCE_MODE`	`normal`	`normal` or `team` (team adds full audit trail)
`CURATOR_GOVERNANCE_MAX_PROACTIVE`	`5`	Max proactive search queries per cycle
`CURATOR_GOVERNANCE_SYNC_BUDGET`	`0`	Sync queries before async (0 = fully async)
`CURATOR_GOVERNANCE_LOOKBACK_DAYS`	`30`	Query log analysis window
`CURATOR_GOVERNANCE_DRY_RUN`	`0`	`1` to skip writes (audit only)
`CURATOR_GOVERNANCE_REPLACES_STRENGTHEN`	`0`	`1` to skip standalone strengthen when governance is on

Other

Variable	Default	Description
`CURATOR_ASYNC_INGEST`	`0`	`1` = fire-and-forget background ingest
`CURATOR_CONFLICT_STRATEGY`	`auto`	`auto` / `local` / `external` / `human`
`CURATOR_CB_ENABLED`	`1`	Circuit breaker (`0` to disable)
`CURATOR_CACHE_ENABLED`	`0`	Search result cache
`CURATOR_FEEDBACK_WEIGHT`	`0.10`	Feedback score adjustment (max delta)
`CURATOR_JSON_LOGGING`	`0`	`1` = JSON structured log output
`CURATOR_CHAT_RETRY_MAX`	`3`	LLM retry attempts

Maintenance

# Weak topic analysis
python3 scripts/analyze_weak.py --top 10

# Proactive strengthening
python3 scripts/strengthen.py --top 5

# Freshness scan
python3 scripts/freshness_scan.py --limit 50       # URL reachability
python3 scripts/freshness_scan.py --act             # Auto-refresh stale

# TTL rebalance
python3 scripts/ttl_rebalance.py                    # Report
python3 scripts/ttl_rebalance.py --json             # JSON export

# Async job management
python3 scripts/async_job_cli.py list               # Overview
python3 scripts/async_job_cli.py list --failed      # Failed jobs
python3 scripts/async_job_cli.py replay <job_id>    # Re-queue a job

# Governance (automated knowledge maintenance)
python3 -m curator.governance_cli report             # View latest report
python3 -m curator.governance_cli report --format json
python3 -m curator.governance_cli report --format html > report.html
python3 -m curator.governance_cli flags              # Pending flags
python3 -m curator.governance_cli flags --all        # All flags
python3 -m curator.governance_cli show <flag_id>     # Flag details
python3 -m curator.governance_cli keep <flag_id>     # Mark: keep resource
python3 -m curator.governance_cli delete <flag_id>   # Mark: approve deletion
python3 -m curator.governance_cli adjust <flag_id>   # Mark: needs adjustment
python3 -m curator.governance_cli ignore <flag_id>   # Mark: ignore this flag
python3 -m curator.governance_cli run                # Trigger full cycle
python3 -m curator.governance_cli run --dry          # Dry run (no writes)
python3 -m curator.governance_cli run --mode team    # Team mode (full audit)

Or enable the background scheduler (CURATOR_SCHEDULER_ENABLED=1) to run freshness scans and strengthening automatically. Add CURATOR_GOVERNANCE_ENABLED=1 for automated governance cycles.

Project structure

curator/
  pipeline_v2.py       # Main pipeline orchestrator
  config.py            # Config + HTTP client with retry
  settings.py          # Pydantic Settings v2 (typed, validated)
  backend.py           # KnowledgeBackend ABC
  backend_ov.py        # OpenViking backend (embedded + HTTP)
  backend_memory.py    # In-memory backend (testing)
  session_manager.py   # Dual-mode OV client
  retrieval_v2.py      # L0→L1→L2 retrieval + coverage
  search.py            # External search + cross-validation
  search_providers.py  # Pluggable provider registry
  review.py            # LLM judge + ingest + conflict
  router.py            # Rule-based routing (JSON config)
  freshness.py         # URI time-decay scoring
  usage_ttl.py         # Usage-based TTL tiers
  dedup.py             # Duplicate scanning
  decision_report.py   # ASCII / JSON / HTML reports
  feedback_store.py    # Up/down/adopt feedback
  domain_filter.py     # Domain whitelist/blacklist
  circuit_breaker.py   # 3-state circuit breaker
  search_cache.py      # LRU + dual-TTL cache
  async_jobs.py        # Background job tracking
  governance.py        # Automated governance cycle (6 phases)
  governance_cli.py    # Governance CLI (report, flags, run)
  governance_report.py # Governance report (ASCII/JSON/HTML)
  interest_analyzer.py # User interest extraction + proactive queries
  nlp_utils.py         # Topic extraction + keyword utils
  scheduler.py         # APScheduler periodic jobs
  logging_setup.py     # structlog configuration
  file_lock.py         # Shared flock utilities
  legacy/              # Archived v1
curator_query.py       # CLI entry point
mcp_server.py          # MCP server (stdio JSON-RPC)
scripts/               # Maintenance scripts
tests/                 # 554 tests

Testing

# All tests (uses InMemoryBackend, no OV dependency)
uv run pytest tests/ -v

# Single file
uv run pytest tests/test_core.py -v

# Type checking
uv run mypy curator/ --ignore-missing-imports --exclude curator/legacy/

Troubleshooting

Problem	Cause	Fix
`Missing required env vars`	`.env` not configured	Fill in `CURATOR_OAI_BASE`, `CURATOR_OAI_KEY`, `CURATOR_GROK_KEY`
`OV not available`	OpenViking not reachable	Check `OPENVIKING_CONFIG_FILE` (embedded) or `OV_BASE_URL` (HTTP)
`401 Unauthorized`	Wrong API key	Check keys in `.env`
Timeout on search	Endpoint unreachable	Check URL and service status
Coverage always 0.0	OV is empty	Ingest some content first, or lower `CURATOR_THRESHOLD_COV_SUFFICIENT`
External always triggered	Thresholds too high	Lower coverage thresholds in `.env`
Judge returns low trust	Weak LLM model	Try a stronger model in `CURATOR_JUDGE_MODELS`

Roadmap

KnowledgeBackend abstraction (OV-agnostic interface)
Conflict detection + bidirectional resolution
Pydantic-validated judge output
Quality feedback loop (feedback → retrieval ranking)
Enhanced dedup (URL hash + Jaccard)
Decision report (ASCII + JSON + HTML)
Async ingest with job tracking + recovery CLI
Auto-generate L0/L1 summaries on ingest
Multi-provider search (Grok + DuckDuckGo + Tavily)
Domain filtering (whitelist / blacklist)
Usage-based TTL (hot / warm / cold tiers)
Circuit breaker + search cache
Structured logging (structlog + JSON mode)
Background scheduler (freshness + strengthen)
Docker support (embedded + HTTP OV modes)
mypy + pre-commit (ruff + ruff-format)
uv dependency management
Automated governance (audit, flag, proactive search, report)
Interest-based proactive search (query log + feedback analysis)
Hybrid async governance (sync budget + background thread + trace harvest)
Coverage auto-tuning (dynamic thresholds from query log)

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.7.0

Feb 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openviking_curator-0.7.0.tar.gz (197.4 kB view details)

Uploaded Feb 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openviking_curator-0.7.0-py3-none-any.whl (148.3 kB view details)

Uploaded Feb 27, 2026 Python 3

File details

Details for the file openviking_curator-0.7.0.tar.gz.

File metadata

Download URL: openviking_curator-0.7.0.tar.gz
Upload date: Feb 27, 2026
Size: 197.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for openviking_curator-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`887fd7a1fb3bdac771b18244052189a1d268ea76deff07b4e8f75b0a12cb851b`
MD5	`4b0dfd4758b6305110869dc94dc1c561`
BLAKE2b-256	`ffa19a95696b86f2a25a2e827ca316b0f6592e989346624f8c7c968931e466ab`

See more details on using hashes here.

File details

Details for the file openviking_curator-0.7.0-py3-none-any.whl.

File metadata

Download URL: openviking_curator-0.7.0-py3-none-any.whl
Upload date: Feb 27, 2026
Size: 148.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for openviking_curator-0.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d82d0729b08edc424d8006875db0a3f3959e9952eaa6fd354643bd50d5c5b978`
MD5	`161918369725a66ed433df661e216ff0`
BLAKE2b-256	`2a5a8842db4e43e1142e1dfdece727fbd6a42dedb4bc1bbcc8cf9359df5f2a07`

See more details on using hashes here.

openviking-curator 0.7.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OpenViking Curator

How it works

Features

What Curator does NOT do

Quick Start

Prerequisites

Install

Configure

Run

Docker

Python API

Configuration

Core (required)

OV mode

Search

Thresholds

Background scheduler

Governance

Other

Maintenance

Project structure

Testing

Troubleshooting

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes