全自动Obsidian知识管理Pipeline - 生产级知识管理流水线

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

fakechris

These details have not been verified by PyPI

Project description

schema_version: "1.0.0" note_id: readme_en-5d661efc title: "Obsidian Vault Pipeline" description: "An auditable knowledge state runtime for Obsidian" date: 2026-04-07 type: meta

Obsidian Vault Pipeline

Auditable knowledge state runtime for Obsidian Vaults
Capture → Compile → Reuse

🇨🇳 简体中文

Current document version: v0.9.3

Primary docs:

What This Is

Obsidian Vault Pipeline is not a loose collection of scripts, and it is not only RAG over Markdown. It is a local knowledge state runtime built around an Obsidian vault:

Capture receives Pinboard, Clippings, raw Markdown, papers, GitHub repos, and web pages while keeping source lifecycle traceable.
Compile turns material into deep dives, candidates, claims, evidence, relations, contradictions, registry rows, and graph rows.
Reuse projects compiled knowledge into reader atlas pages, object pages, graph views, briefings, search, context packs, writing prompts, and the operator workbench.

Internally the engineering model still uses six layers: Ingest -> Interpret -> Absorb -> Refine -> Canonical -> Derived. The product narrative is Capture -> Compile -> Reuse.

The current release wires those layers into the actual runtime:

ovp --full runs through knowledge_index by default
ovp --incremental is the daily incremental entry point, including recent Pinboard + Clippings and downstream stages
ovp --full --with-refine inserts refine before the final derived refresh
ovp-autopilot runs real-time absorb -> moc -> knowledge_index
ovp-autopilot --with-refine adds refine to that path
ovp-ui provides a local UI. The default / entry is now a reader-first Knowledge Library, the operator dashboard lives under /ops, object pages expose source/backlink context, and /graph (also /map) renders a reader-facing knowledge map.

Why The Architecture Looks Like This

This repository started as a set of Obsidian automation scripts, but that model stopped scaling once the system grew:

the main runtime and individual scripts drifted apart
concepts, links, Atlas, graph, and retrieval indexes were tightly coupled without a clean truth boundary
new domains like media, medical, or engineering research could not be modeled safely with a concept-only core

The current architecture is the direct answer to those failures:

Capture -> Compile -> Reuse explains the product value
source -> observation -> claim -> evidence -> validity -> projection -> permission explains the long-term knowledge state
the six-layer runtime makes orchestration, canonical state, and derived state explicit
research-tech makes the current engineering research semantics explicit
default-knowledge is being reduced to a default compatibility layer instead of carrying every domain semantic
Pack API turns future domains into installable packs rather than more hardcoded branches inside the runtime

So the project is no longer just a Vault automation repo. It is now:

a reader-first, evidence-backed knowledge atlas over an auditable knowledge state runtime

with:

research-tech as the first explicit built-in standard pack
default-knowledge retained as the default compatibility pack
knowledge.db as a derived store, never Authority
vault markdown + registry + evidence chains as the long-term trust boundary

Current Roadmap

OVP is evolving from a personal Zettelkasten into a typed knowledge platform — reader-first for humans, programmable for agents, extensible through domain packs.

active backlog: BACKLOG.md
current milestone: MILESTONE.md
current merged roadmap rationale: docs/plans/2026-04-29-consolidated-product-roadmap.md
reader product-shape note: docs/plans/2026-04-29-reader-product-shape-and-backlog-reconciliation.md

Current milestone sequence:

Milestone	Status	Meaning
M0–M3	Done	Foundation, operator workbench, roadmap consolidation, reader-first atlas
M4 KSR Safety And Hot-Path Hardening	Done	projection labels, hot-path audit, wiring evals, evidence spans, candidate risk, JSONL streaming, projection lifecycle hardening
M5 Context Pack And Operational Runtime	Done	session snapshots, context budget, runtime state, runtime-state API, action queue health
M5a Quality And Dedup Hardening	Done	concept dedup pipeline integration, promote semantic guard, historical data cleanup
M8 Type Unification And Extraction Quality	Active	unified object kind taxonomy, Layer 1 entity_type, body-size-aware extraction, quote-grounding, single-pass LLM refactor
M9 Pack As Domain Ontology	Next	pack-defined object kind specs, typed relation constraints, schema registry
M10 Operational Knowledge Layer	Later	action types, permissions, cross-entity aggregation, decision memory

Recent major changes (PRs #98–#101):

JSONL streaming hardening, advisory file locks, runtime-state API fixes
Four-phase architecture refactor: module boundary cleanup, route hardening (CSP/CSRF), projection lifecycle
Concept dedup pipeline integration with scoped scope_slugs parameter
Promote semantic guard: trigram-Jaccard pre-check merges near-duplicate candidates into existing Evergreens
Historical Evergreen data cleanup (71→61 active Evergreens)
find_similar_slugs utility for similarity checking

Domain Packs

The core runtime is now being formalized as a pack-aware platform.

Built-in standard pack: research-tech
Default compatibility pack: default-knowledge
Runtime selection is exposed through --pack and --profile
Third-party packs can be discovered through the ovp.packs entry point group or the OVP_PACK_MANIFESTS manifest list

Examples:

ovp-packs
ovp-doctor --pack research-tech --json
ovp --pack research-tech --profile full
ovp-autopilot --pack research-tech --profile autopilot --yes
ovp --pack default-knowledge --profile full

Pack API documentation for third-party developers lives in:

docs/pack-api/README.md
docs/pack-api/manifest-and-hooks.md
docs/pack-api/dogfooding-with-media-pack.md

Platform Architecture

From a platform perspective, the system now has three layers:

Core Platform
Domain Pack
Workflow Profile

1. Core Platform

Core owns the cross-domain pieces that must remain stable:

runtime / vault layout
CLI orchestration
autopilot / queue / watcher
canonical identity helpers
registry framework
derived knowledge.db
graph / lint / audit infrastructure
plugin / pack loading
base evidence schema contracts

2. Domain Pack

A pack is not just a prompt bundle. It defines domain semantics:

object kinds
workflow profiles
discovery boundaries
absorb / refine / lint rules
schemas / templates / prompt resources

The built-in packs are:

research-tech: the explicit technical research pack and the default workflow pack
default-knowledge: the compatibility layer

Future domains such as media or medical should arrive as external pack projects.

3. Workflow Profile

A workflow profile is an executable DAG under a pack.

The built-in profiles currently shipped are:

research-tech/full
research-tech/autopilot
default-knowledge/full

Research-Tech Operational Surface

research-tech is no longer only an internal pack. It now has a minimal operational surface:

ovp-doctor reports default workflow pack, pack roles, operator docs, recipes, and optional vault health
ovp-export exports minimal compiled artifacts:
- object-page
- topic-overview
- event-dossier
- contradictions
ovp-truth reads object / contradiction / neighborhood truth rows directly from knowledge.db
ovp-ui launches a local UI. The default / entry is the reader-first Knowledge Library; the operator dashboard lives under /ops.
docs/research-tech/RESEARCH_TECH_SKILLPACK.md
docs/research-tech/RESEARCH_TECH_VERIFY.md
docs/recipes/research-tech/*.md

Examples:

ovp-doctor --pack research-tech --json
ovp-truth objects --vault-dir /path/to/vault
ovp-ui --vault-dir /path/to/vault --port 8787
ovp-export --pack research-tech --target topic-overview --output-path /tmp/topic.md

default-knowledge/autopilot

That is why the default workflow path now runs:

ovp --full
ovp-autopilot --yes

You can still select packs explicitly:

ovp --pack research-tech --profile full
ovp-autopilot --pack research-tech --profile autopilot --yes
# compatibility path
ovp --pack default-knowledge --profile full

Plugin Design

The plugin / pack surface is no longer only a design memo. There is now a minimal working integration path.

Two discovery modes are supported:

Python entry point group: ovp.packs
Explicit manifest list: OVP_PACK_MANIFESTS=/path/a.yaml:/path/b.yaml

The minimum third-party loading chain is:

provide a manifest
declare entrypoints.pack
return a BaseDomainPack
pass api_version validation
select it through --pack/--profile

Hard boundaries currently enforced by core:

a pack cannot turn semantic retrieval into canonical identity
a pack cannot treat knowledge.db as Authority
a pack cannot bypass audit/logging
all derived state must remain rebuildable

Runtime Model

Authority Boundary

The system keeps a hard boundary:

Authority: vault markdown + concept registry
derived views: Atlas, MOC, graph, knowledge.db, lint, daily delta
not Authority: knowledge.db

knowledge.db is the GBrain-inspired derived index layer. It stores:

page FTS
structured links
mirrored raw sidecars
timeline / audit events
deterministic section embeddings
read-only query / serve surfaces

It is rebuildable and does not own canonical identity resolution.

The Six Layers

Layer	Responsibility	Representative commands	Can the LLM make major decisions here?
Ingest	Normalize incoming material	`ovp --step pinboard` `ovp --step clippings` `ovp-article`	No
Interpret	Produce deep interpretations	`ovp-article` `ovp-github` `ovp-paper`	Yes, with constrained output
Absorb	Compile interpretations into lifecycle actions	`ovp-absorb` `ovp-evergreen`	Yes, but only through structured results
Refine	Cleanup and breakdown existing notes	`ovp-cleanup` `ovp-breakdown`	Yes, but execution is controlled
Canonical	Maintain registry / aliases / Atlas / MOC	`ovp-rebuild-registry` `ovp-moc` `ovp-promote-candidates`	No
Derived	Build retrieval / graph / lint views	`ovp-knowledge-index` `ovp-graph` `ovp-lint`	No

What `ovp --full` Actually Runs

Default full pipeline:

pinboard
→ pinboard_process
→ clippings
→ articles
→ quality
→ fix_links
→ absorb
→ dedup
→ note_type_normalize
→ registry_sync
→ moc
→ knowledge_index

With refine enabled:

pinboard
→ pinboard_process
→ clippings
→ articles
→ quality
→ fix_links
→ absorb
→ dedup
→ note_type_normalize
→ registry_sync
→ moc
→ refine
→ knowledge_index

Important details:

absorb shells to ovp_pipeline.commands.absorb and emits promoted_slugs for downstream steps
dedup runs post-absorb concept deduplication scoped to recently promoted slugs (trigram-Jaccard similarity)
note_type_normalize normalizes note_type metadata across Evergreen files
refine is a batch wrapper over cleanup + breakdown
knowledge_index always runs last so knowledge.db reflects final canonical state
--step evergreen and --from-step evergreen are still accepted and map to absorb

What `ovp-autopilot` Actually Runs

Default real-time path:

interpretation
→ quality
→ absorb
→ moc
→ knowledge_index
→ auto_commit(optional)

Enable refine explicitly:

ovp-autopilot --watch=inbox --with-refine --yes

That changes the path to:

interpretation
→ quality
→ absorb
→ moc
→ refine
→ knowledge_index
→ auto_commit(optional)

Refine is not hidden or missing. It is wired in, but opt-in by default to avoid silent real-time structural rewrites of the whole knowledge base.

Command Overview

Daily entry points

Command	Purpose
`ovp --check`	Validate runtime configuration
`ovp --full`	Run the full daily pipeline
`ovp --full --with-refine`	Run full pipeline plus cleanup/breakdown
`ovp --step absorb`	Run only the absorb layer
`ovp --step refine`	Run only the batch refine layer
`ovp --from-step absorb`	Resume from absorb onward

Content processors

Command	Purpose
`ovp-article --process-inbox --vault-dir <vault>`	Process raw documents
`ovp-github --process-single <file> --vault-dir <vault>`	Process GitHub inputs
`ovp-paper --process-single <file> --vault-dir <vault>`	Process paper inputs

Absorb / Refine / Canonical

Command	Purpose
`ovp-absorb --recent 7 --json`	Absorb recent deep dives
`ovp-absorb --file <source.md> --dry-run --json`	Preview source lifecycle routing before moving or processing source material
`ovp-evergreen --recent 7 --json`	Compatibility alias for `ovp-absorb`
`ovp-concept-dedup --vault-dir <vault> --threshold 0.82`	Find and propose concept deduplication clusters
`ovp-concept-dedup --vault-dir <vault> --apply`	Apply deduplication proposal (archive losers, rewrite wikilinks)
`ovp-cleanup --all --json`	Generate cleanup proposals
`ovp-cleanup --all --write --json`	Apply deterministic cleanup
`ovp-breakdown --all --json`	Generate breakdown proposals
`ovp-breakdown --all --write --json`	Apply incremental breakdown
`ovp-rebuild-registry --json`	Reconcile evergreen notes and registry
`ovp-promote-candidates review`	Review candidate lifecycle
`ovp-moc --scan --vault-dir <vault>`	Refresh MOC / Atlas

Derived layer

Command	Purpose
`ovp-knowledge-index --json`	Rebuild `knowledge.db`
`ovp-knowledge-index --search "query" --json`	Run FTS search
`ovp-knowledge-index --query "question" --json`	Run embedding chunk query
`ovp-knowledge-index --get slug --json`	Read a canonical page
`ovp-knowledge-index --stats --json`	Read index stats
`ovp-knowledge-index --audit-recent --json`	Read recent audit events
`ovp-knowledge-index --tools-json`	Emit tool discovery JSON
`ovp-knowledge-index --serve`	Start read-only stdio JSONL service
`ovp-graph daily today --vault-dir <vault>`	Build daily graph delta
`ovp-lint --check --vault-dir <vault>`	Run structure/link checks

Operations

Command	Purpose
`ovp-runtime-state --vault-dir <vault> --write --json`	Build the operational runtime state projection from repair markers, workflow actions, pipeline events, and reuse events; writes `60-Logs/runtime-state/current.{json,md}`
`GET /api/runtime-state`	Local read endpoint for the provider-facing runtime-state projection; prefers the materialized `60-Logs/runtime-state/current.json` and falls back to rebuild when missing
`POST /api/runtime-state`	Refresh and write the materialized runtime-state projection

Context packs

Command	Purpose
`ovp-working-memory --vault-dir <vault>`	Write the daily budgeted context pack to `60-Logs/working-memory/YYYY-MM-DD.md` and emit trusted reuse events for selected objects
`ovp-prime --vault-dir <vault> --session-id <id>`	Write an OVP Prime session snapshot to `60-Logs/session-snapshots/<id>.md`, refresh `latest.md`, and emit `ovp_prime` reuse events

AutoPilot

Command	Purpose
`ovp-autopilot --watch=inbox --parallel=1 --yes`	Default real-time pipeline
`ovp-autopilot --watch=inbox,pinboard --yes`	Watch multiple sources
`ovp-autopilot --with-refine --yes`	Add refine to the real-time path
`ovp-autopilot --no-commit --yes`	Disable auto-commit

Directory Layout

vault/
├── 50-Inbox/
│   ├── 01-Raw/
│   ├── 02-Pinboard/
│   └── 03-Processed/
├── 10-Knowledge/
│   ├── Evergreen/
│   └── Atlas/
│       ├── Atlas-Index.md
│       ├── concept-registry.jsonl
│       └── alias-index.json
├── 20-Areas/
│   └── {AI-Research, Investing, Programming, Tools}/Topics/YYYY-MM/
├── 60-Logs/
│   ├── pipeline.jsonl
│   ├── refine-mutations.jsonl
│   ├── transactions/
│   ├── quality-reports/
│   ├── daily-deltas/
│   ├── working-memory/
│   ├── session-snapshots/
│   ├── runtime-state/
│   └── knowledge.db
└── 70-Archive/

What `knowledge.db` Provides

knowledge.db is a rebuildable local derived index. It currently includes:

pages_index
page_fts
page_links
raw_data
timeline_events
audit_events
page_embeddings

It exists to power:

keyword retrieval
embedding retrieval
canonical page reads
audit browsing
tool discovery and read-only serving

Default discovery now routes through this layer:

ovp-query uses knowledge.db by default
keyword retrieval uses FTS5 BM25
semantic retrieval uses local deterministic embeddings
QMD is no longer the default runtime dependency; it is opt-in via --engine qmd

Quick Start

curl -fsSLO https://raw.githubusercontent.com/fakechris/obsidian_vault_pipeline/main/scripts/install-user.sh
less install-user.sh
bash install-user.sh

mkdir -p my-vault
cd my-vault

ovp --check
ovp --full

If you prefer the explicit PyPI two-step flow:

python3 -m pip install --user obsidian-vault-pipeline
python3 -m ovp_pipeline.installer

If your Python installation enforces PEP 668, prefer:

pipx install obsidian-vault-pipeline

The installer prefers a writable, safe bin directory that is already on PATH; if none is available, it falls back to ~/.local/bin. It does not edit your shell configuration.

If you want to see the refine layer explicitly:

ovp --full --with-refine

If you want a daemon:

ovp-autopilot --watch=inbox --parallel=1 --yes

Configuration

Put .env in the vault root:

AUTO_VAULT_API_KEY=your_key_here
AUTO_VAULT_API_BASE=https://api.minimaxi.com/anthropic
AUTO_VAULT_MODEL=anthropic/MiniMax-M2.7-highspeed

# Optional
PINBOARD_TOKEN=username:token
HTTP_PROXY=http://127.0.0.1:7897

Design Principles

identity consistency before feature growth
vault files + registry define canonical state
knowledge.db is derived retrieval, never a second Authority
absorb is part of daily automation; refine is powerful and opt-in by default
Wiki, MOC, dashboard, briefing, graph, reader pages, and context packs are projections that carry explicit projection metadata and must trace back to source/evidence
reader-facing UI should explain knowledge first, then expose operator/debug detail
docs must describe what actually ships, not a future architecture sketch

Related Resources

This document targets: v0.9.3

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

fakechris

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.18.0

May 11, 2026

0.16.0

May 10, 2026

0.15.0

May 8, 2026

0.14.0

May 8, 2026

0.13.0

May 7, 2026

0.12.0

May 5, 2026

This version

0.10.0

May 1, 2026

0.9.2

Apr 25, 2026

0.8.14

Apr 18, 2026

0.8.6

Apr 15, 2026

0.8.5

Apr 15, 2026

0.8.4

Apr 14, 2026

0.8.3

Apr 14, 2026

0.8.2

Apr 13, 2026

0.7.0

Apr 7, 2026

0.6.0

Apr 7, 2026

0.4.0

Apr 7, 2026

0.2.1

Apr 4, 2026

0.2.0

Apr 4, 2026

0.1.8

Apr 3, 2026

0.1.7

Apr 3, 2026

0.1.6

Apr 3, 2026

0.1.5

Apr 3, 2026

0.1.4

Apr 3, 2026

0.1.2

Apr 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

obsidian_vault_pipeline-0.10.0.tar.gz (1.1 MB view details)

Uploaded May 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

obsidian_vault_pipeline-0.10.0-py3-none-any.whl (624.3 kB view details)

Uploaded May 1, 2026 Python 3

File details

Details for the file obsidian_vault_pipeline-0.10.0.tar.gz.

File metadata

Download URL: obsidian_vault_pipeline-0.10.0.tar.gz
Upload date: May 1, 2026
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for obsidian_vault_pipeline-0.10.0.tar.gz
Algorithm	Hash digest
SHA256	`fac8a7177b13d93f24366dc064c1c2320818fc7a9e8d27304cb4a02ec9a0ec28`
MD5	`d10577e11b83f177661de8e93ffcf0cb`
BLAKE2b-256	`80c516f7375b87313d253c2b4afb7ea4a5cd727342848a9907f3a9e710259a2e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for obsidian_vault_pipeline-0.10.0.tar.gz:

Publisher: publish-pypi.yml on fakechris/obsidian_vault_pipeline

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: obsidian_vault_pipeline-0.10.0.tar.gz
- Subject digest: fac8a7177b13d93f24366dc064c1c2320818fc7a9e8d27304cb4a02ec9a0ec28
- Sigstore transparency entry: 1418126890
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: fakechris/obsidian_vault_pipeline@a311c6a02c7fccab63e602d4ca0747bd7a0682af
- Branch / Tag: refs/tags/v0.10.0
- Owner: https://github.com/fakechris
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@a311c6a02c7fccab63e602d4ca0747bd7a0682af
- Trigger Event: push

File details

Details for the file obsidian_vault_pipeline-0.10.0-py3-none-any.whl.

File metadata

Download URL: obsidian_vault_pipeline-0.10.0-py3-none-any.whl
Upload date: May 1, 2026
Size: 624.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for obsidian_vault_pipeline-0.10.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`284e3bc65e63d16ed51b99788e36c3b4c28915b681ede961c940b9ba3f08b901`
MD5	`d46cb0bdd121a2c7f68ad615e9746423`
BLAKE2b-256	`a76f5bb8f2329ee325bf3ab0bbc170bd86a19d725e69af51e2dbefe6360fa91b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for obsidian_vault_pipeline-0.10.0-py3-none-any.whl:

Publisher: publish-pypi.yml on fakechris/obsidian_vault_pipeline

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: obsidian_vault_pipeline-0.10.0-py3-none-any.whl
- Subject digest: 284e3bc65e63d16ed51b99788e36c3b4c28915b681ede961c940b9ba3f08b901
- Sigstore transparency entry: 1418126967
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: fakechris/obsidian_vault_pipeline@a311c6a02c7fccab63e602d4ca0747bd7a0682af
- Branch / Tag: refs/tags/v0.10.0
- Owner: https://github.com/fakechris
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@a311c6a02c7fccab63e602d4ca0747bd7a0682af
- Trigger Event: push

obsidian-vault-pipeline 0.10.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

schema_version: "1.0.0" note_id: readme_en-5d661efc title: "Obsidian Vault Pipeline" description: "An auditable knowledge state runtime for Obsidian" date: 2026-04-07 type: meta

Obsidian Vault Pipeline

What This Is

Why The Architecture Looks Like This

Current Roadmap

Domain Packs

Platform Architecture

1. Core Platform

2. Domain Pack

3. Workflow Profile

Research-Tech Operational Surface

Plugin Design

Runtime Model

Authority Boundary

The Six Layers

What ovp --full Actually Runs

What ovp-autopilot Actually Runs

Command Overview

Daily entry points

Content processors

Absorb / Refine / Canonical

Derived layer

Operations

Context packs

AutoPilot

Directory Layout

What knowledge.db Provides

Quick Start

Configuration

Design Principles

Related Resources

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

What `ovp --full` Actually Runs

What `ovp-autopilot` Actually Runs

What `knowledge.db` Provides