Skip to main content

Git-backed document store with REST API, CLI, and FastAPI embedding — every write is a commit, every document has full revision history.

Project description

DocVault

DocVault is a git-backed document store for structured data. Every write is a git commit; every document carries a full, auditable revision history; any document can be retrieved exactly as it was at any point in time.

It ships as a standalone REST API, an embeddable FastAPI shim, and a CLI — fitting equally as an independent microservice or as a library embedded inside your existing application.

Documents can be plain JSON or any text-based file. Templates define named folder structures with optional per-slot JSON Schema validation. Templates are content-addressable: their ID encodes both the source path and a hash of the content, so any change to the folder is detected automatically and increments the template version. Revert a folder to a previous snapshot and the version number reverts with it. A template can be bootstrapped from a local directory, exported as a zip archive, and deployed to any target path. The vault carries a semantic version that can be bumped to create a permanent git-tag snapshot of the entire collection. An optional LLM integration (Claude) auto-generates summaries and keywords from document content.


Table of contents


Features

  • Git-backed storage — every document write (create, update, delete) produces a git commit. Full history with author, timestamp, and message.
  • Point-in-time retrieval — fetch any document at any commit SHA, tag, or branch.
  • JSON Schema templates — define named folder structures; each slot can carry its own JSON Schema draft-7 constraint validated on every write.
  • Content-addressable template IDs — template IDs encode {name}:{path_hash}:{content_hash}. Change the source folder and the ID changes; revert the folder and the original ID comes back.
  • Integer template versioning — templates start at version 1. Each detected content change bumps the version. Reverting to a previously-seen content snapshot reverts the version number too.
  • Vault versioning — bump the vault's semantic version and create a git tag snapshot of the entire collection.
  • Batch deploy — create many documents from a single template in one atomic commit.
  • LLM summarization — auto-generate summary and keywords from document content using Claude.
  • Flexible authentication — none (dev), static API keys, or passthrough to your own auth system.
  • Embeddable — mount DocVault inside any existing FastAPI app via DocVaultShim without conflicts.
  • Interactive Swagger UI — browse and test all endpoints at /docs when the server is running.

Installation

# Core (REST API + CLI)
pip install py-docvault
# or
uv add py-docvault

# With LLM summarization
pip install "py-docvault[llm]"
uv add "py-docvault[llm]"

Requires Python 3.11+.


Quick start — standalone server

# 1. Create a vault
docvault init ./my-vault

# 2. Start the server
docvault serve --host 0.0.0.0 --port 8000

# 3. Open the interactive docs
open http://localhost:8000/docs

The Swagger UI at /docs lets you explore and execute every endpoint directly from the browser. A ReDoc view is also available at /redoc.


Quick start — embedded in your FastAPI app

from fastapi import FastAPI
from docvault import DocVaultShim, VaultConfig

config = VaultConfig(vault_path="./vault")
shim = DocVaultShim(config)

app = FastAPI(lifespan=shim.wrap_lifespan())
app.include_router(shim.router)

That's it. All DocVault routes are now available at /api/v1/… inside your app. See The Shim for advanced patterns.


The Shim — embedding DocVault

DocVaultShim is the primary integration point for host apps. It owns the DocVault store and its lifecycle, and provides three mounting patterns so it fits cleanly into whatever lifecycle management your app already uses.

Constructor

DocVaultShim(
    config: VaultConfig,
    *,
    auth_dep: Callable | None = None,        # see Authentication section below
    passthrough_dep: Callable | None = None, # alternative to auth_dep
    prefix: str = "/api/v1",                 # URL prefix for all routes
)
Parameter Description
config VaultConfig instance (see Configuration)
auth_dep A FastAPI dependency callable injected into every route for auth side-effects. Takes priority over passthrough_dep.
passthrough_dep Used when config.auth_mode == AuthMode.PASSTHROUGH. Equivalent to auth_dep but routed through build_auth_dep.
prefix URL prefix for all DocVault routes. Default: /api/v1.

Pattern 1: lifespan context manager (recommended)

Use this when your app already has a lifespan function and you want full control over the order of startup/shutdown operations.

from contextlib import asynccontextmanager
from fastapi import FastAPI
from docvault import DocVaultShim, VaultConfig

config = VaultConfig(vault_path="./vault")
shim = DocVaultShim(config)

@asynccontextmanager
async def lifespan(app: FastAPI):
    # DocVault boots first
    async with shim.lifespan():
        # your startup code here — vault is ready
        await connect_database()
        yield
        # your shutdown code here

app = FastAPI(lifespan=lifespan)
app.include_router(shim.router)

shim.lifespan() is a plain async context manager — no FastAPI-specific coupling. Enter it anywhere an async with block is valid.

Pattern 2: wrap_lifespan helper

Use this when you want DocVault to compose with your existing lifespan function, or when you have no lifespan at all.

Without an existing lifespan:

app = FastAPI(lifespan=shim.wrap_lifespan())
app.include_router(shim.router)

With an existing lifespan:

@asynccontextmanager
async def my_lifespan(app: FastAPI):
    await connect_database()
    yield
    await disconnect_database()

# DocVault boots first, then your lifespan is entered
app = FastAPI(lifespan=shim.wrap_lifespan(my_lifespan))
app.include_router(shim.router)

wrap_lifespan always boots DocVault before delegating to the host lifespan, so DocVault routes are usable from the very first request.

Pattern 3: direct startup call

Use this when your framework manages dependencies through a DI container, service locator, or explicit on_startup hook.

shim = DocVaultShim(config)
app.include_router(shim.router)

# Anywhere before the first request is served:
await shim.startup()

startup() is idempotent — calling it multiple times is safe. It simply calls await store.init() which is a no-op if the vault already exists.

Authentication in the shim

DocVault supports three auth strategies, configured via the auth_dep constructor parameter and config.auth_mode.

Option A — auth_dep (recommended for host apps)

Pass any FastAPI dependency callable. It is executed before every DocVault route. The return value is ignored by DocVault — raise HTTPException(401) or 403 to block requests.

from fastapi import Depends, HTTPException, Request
from your_app.auth import verify_token

async def require_admin(request: Request):
    token = request.headers.get("Authorization", "").removeprefix("Bearer ")
    user = await verify_token(token)
    if user.role != "admin":
        raise HTTPException(status_code=403, detail="Admin required")
    return user

shim = DocVaultShim(config, auth_dep=require_admin)

The dependency can itself declare Depends(...) and will be resolved by FastAPI's DI engine:

async def require_logged_in(current_user = Depends(get_current_user)):
    if not current_user:
        raise HTTPException(401)
    return current_user

shim = DocVaultShim(config, auth_dep=require_logged_in)

Option B — passthrough_dep with AuthMode.PASSTHROUGH

Equivalent to auth_dep but uses the build_auth_dep code path. Requires auth_mode = "passthrough" in your config:

config = VaultConfig(vault_path="./vault", auth_mode="passthrough")
shim = DocVaultShim(config, passthrough_dep=your_auth_callable)

Option C — built-in API key auth

Set auth_mode = "api_key" and provide keys in your config. Every request must include an X-API-Key header.

config = VaultConfig(
    vault_path="./vault",
    auth_mode="api_key",
    api_keys=["sk-your-key-here"],
)
shim = DocVaultShim(config)

Option D — no auth (development)

The default. All requests are allowed. Do not use in production.

config = VaultConfig(vault_path="./vault")  # auth_mode defaults to "none"
shim = DocVaultShim(config)

Custom URL prefix

Override the default /api/v1 prefix to avoid collisions with your existing routes:

shim = DocVaultShim(config, prefix="/v2/documents")
# Routes are now: /v2/documents/health, /v2/documents/docs, etc.

Testing your shim integration

ASGITransport (used by httpx for in-process testing) does not trigger the ASGI lifespan protocol. If your test relies on store.init() running (e.g. document CRUD), use the asgi_lifespan_client helper from tests/conftest.py:

from tests.conftest import asgi_lifespan_client

async def test_my_integration():
    shim = DocVaultShim(config)
    app = FastAPI(lifespan=shim.wrap_lifespan())
    app.include_router(shim.router)

    async with asgi_lifespan_client(app) as client:
        resp = await client.post("/api/v1/docs", json={...})
        assert resp.status_code == 201

asgi_lifespan_client manually drives the ASGI lifespan.startup / lifespan.shutdown event cycle before and after yielding the client.

Tests that only hit /health or test auth rejection (no store access needed) can use AsyncClient(transport=ASGITransport(app=app)) directly.


Templates & versioning

Creating a template

Templates define a named set of document slots — logical paths such as "config/app" or "docs/readme" — with optional JSON Schema validation per slot. There are two ways to create one:

From a structure dict (define slots explicitly):

from docvault.core.template import DocSlot, TemplateCreateInput
from docvault.core.store import DocVault

ref = await store.create_template(
    TemplateCreateInput(
        name="microservice",
        structure={
            "config/app": DocSlot(
                required=True,
                json_schema={
                    "type": "object",
                    "required": ["host", "port"],
                    "properties": {
                        "host": {"type": "string"},
                        "port": {"type": "integer"},
                    },
                },
            ),
            "config/database": DocSlot(required=True),
            "docs/readme": DocSlot(required=False),
        },
    )
)
print(ref.name, ref.id)

From a path (scan a folder or single file):

ref = await store.create_template(
    TemplateCreateInput(name="project-docs", path=Path("./docs-folder"))
)

The folder is scanned recursively. Each file becomes a slot (using its relative path without extension). The file contents are ingested as vault documents. A single file creates one flat slot named after the file stem.

create_template returns a TemplateRef(name, id). Use id for all subsequent get/validate/export/delete operations.

Content-addressable IDs

Template IDs are not random UUIDs — they encode identity and content:

{name}:{md5(path)}:{md5(content)}
  • name — the template name (first segment, also the storage key)
  • md5(path) — hash of the source path (or name for structure-based templates)
  • md5(content) — hash of the file tree or structure; this is what changes when content changes

The same name + same content_hash → same ID. The same name + different content_hash → different ID.

Integer versioning and upsert

Template version is an integer starting at 1. create_template is an upsert:

Scenario Result
Template name does not exist Created with version=1
Same name, same content hash No-op — existing TemplateRef returned unchanged
Same name, new content hash (not seen before) Version incremented by max(history) + 1
Same name, content hash matches a historical snapshot Version reverted to the snapshot's number

The version_history field on Template records every {content_hash: version} mapping the template has ever had.

# Create
ref_v1 = await store.create_template(TemplateCreateInput(name="svc", path=folder))
tpl = await store.get_template(ref_v1.id)
print(tpl.version)        # 1
print(tpl.version_history)  # {"<hash_a>": 1}

# Change the folder contents
ref_v2 = await store.create_template(TemplateCreateInput(name="svc", path=folder))
tpl = await store.get_template(ref_v2.id)
print(tpl.version)        # 2
print(tpl.version_history)  # {"<hash_a>": 1, "<hash_b>": 2}

# Revert the folder to its original contents
ref_v3 = await store.create_template(TemplateCreateInput(name="svc", path=folder))
tpl = await store.get_template(ref_v3.id)
print(tpl.version)        # 1  ← reverted
print(ref_v3.id == ref_v1.id)  # True — same ID as the original

Version revert

When the source folder is restored to a previously-seen state, the version number goes back to the number recorded for that content hash. History is never erased — reverting to v1 and then making a new change will produce v3, not v2, because v2 remains in history.

Copies

Two templates are copies when they share the same name (first segment of the ID) and the same content_hash (last segment), regardless of whether the path hash (middle segment) differs. Creating a copy is always a no-op.


Configuration

Configuration is resolved in this order (later sources win):

  1. docvault.json in the current directory (or --config path)
  2. Environment variables

docvault.json

{
  "vault_path": "./vault",
  "vault_name": "my-vault",
  "vault_description": "Production document store",
  "auth_mode": "api_key",
  "api_keys": ["sk-aaaa", "sk-bbbb"],
  "default_creator": "system",
  "git_author_name": "docvault-bot",
  "git_author_email": "bot@example.com",
  "llm_api_key": "sk-ant-...",
  "llm_model": "claude-haiku-4-5-20251001",
  "auto_summarize": false
}

Environment variables

Variable Config field Notes
DOCVAULT_PATH vault_path
DOCVAULT_VAULT_NAME vault_name
DOCVAULT_AUTH_MODE auth_mode none, api_key, or passthrough
DOCVAULT_API_KEYS api_keys Comma-separated list
DOCVAULT_DEFAULT_CREATOR default_creator
DOCVAULT_GIT_AUTHOR_NAME git_author_name
DOCVAULT_GIT_AUTHOR_EMAIL git_author_email
DOCVAULT_LLM_API_KEY llm_api_key Anthropic API key
DOCVAULT_LLM_MODEL llm_model Default: claude-haiku-4-5-20251001
DOCVAULT_AUTO_SUMMARIZE auto_summarize 1, true, or yes

Full field reference

Field Type Default Description
vault_path path ./vault Directory where git repo and documents are stored
vault_name string "default" Logical name for this vault
vault_description string "" Human-readable description
auth_mode enum "none" Auth strategy: none, api_key, passthrough
api_keys list[str] [] Valid keys when auth_mode = "api_key"
default_creator string "system" Fallback creator used by background jobs
git_author_name string "docvault" Git author name for system commits
git_author_email string "docvault@localhost" Git author email for system commits
llm_api_key string null Anthropic API key — required for summarization
llm_model string "claude-haiku-4-5-20251001" Claude model for summarization
auto_summarize bool false Auto-run LLM on every create/update

CLI reference

docvault [OPTIONS] COMMAND [ARGS]...

Global option

-c, --config PATH — Path to a docvault.json file. Defaults to ./docvault.json.


docvault init [PATH]

Create a new vault (or open an existing one — idempotent).

docvault init ./my-vault
docvault init                # uses vault_path from config

docvault serve

Start the REST API server.

docvault serve
docvault serve --host 0.0.0.0 --port 9000
Flag Default Description
--host 127.0.0.1 Bind address
-p, --port 8000 Port number

docvault docs

docvault docs list [--template NAME] [--creator NAME] [--keywords KW1,KW2]
docvault docs get <DOC_ID>
docvault docs create --creator alice --file content.json [--template NAME] [--summary TEXT] [--keywords KW1,KW2]
docvault docs create --creator alice --file -           # read JSON from stdin
docvault docs update <DOC_ID> --file updated.json [--summary TEXT] [--keywords KW]
docvault docs delete <DOC_ID> [--force]
docvault docs history <DOC_ID> [--max 20]
docvault docs at <DOC_ID> <REF>                         # git SHA, tag, or branch
docvault docs summarize <DOC_ID> [--overwrite]
docvault docs summarize-all [--overwrite]

docvault templates

docvault templates list
docvault templates get <TEMPLATE_ID>
docvault templates create <NAME> --file schema.json [--description TEXT]
docvault templates create <NAME> --path ./folder    [--description TEXT]
docvault templates create <NAME> --path ./file.json [--description TEXT]
docvault templates delete <TEMPLATE_ID> [--force]

--file schema.json — JSON file mapping slot paths to DocSlot objects (explicit structure).

--path — folder or single file to ingest as slots. Every file in the folder becomes a document slot; a single file creates one flat slot named after the file stem. create is an upsert: if a template with the same name already exists, it is updated if the content changed, or left unchanged if it has not.

<TEMPLATE_ID> for get and delete is the full content-addressable ID returned by create (format: name:md5:md5). Use templates list to see all current IDs.


docvault vault

docvault vault info
docvault vault versions
docvault vault bump [major|minor|patch]     # default: patch
docvault vault deploy --template NAME --file specs.json

specs.json is a JSON array of DeployDocSpec objects:

[
  { "path": "config/app",      "content": { "host": "api.example.com", "port": 8080 }, "creator": "ci-bot" },
  { "path": "config/database", "content": { "host": "db.internal" },                   "creator": "ci-bot" }
]

docvault config

docvault config show               # print resolved config (keys masked)
docvault config generate-key       # print a random API key
docvault config generate-key -n 3  # print 3 keys

LLM summarization

DocVault uses Claude to infer summary (a one-sentence description) and keywords (a list of tags) from document content.

Setup:

export DOCVAULT_LLM_API_KEY="sk-ant-..."
# or set llm_api_key in docvault.json

Auto-summarize on every write:

{ "auto_summarize": true }

On-demand via API:

POST /api/v1/docs/{id}/summarize
POST /api/v1/docs/summarize/all

On-demand via CLI:

docvault docs summarize <DOC_ID>
docvault docs summarize-all

Summarization is skipped if the document already has a summary unless --overwrite / ?overwrite=true is passed.


API reference

Full endpoint reference: docs/api.md

When the server is running, the interactive Swagger UI is at:

http://localhost:8000/docs

ReDoc is at /redoc. The raw OpenAPI spec is at /openapi.json.

To export the spec without a running server:

task openapi           # writes docs/openapi.json

Development

Setup

git clone https://github.com/your-org/docvault
cd docvault
uv sync --all-extras

Taskfile tasks

Task Description
task test Run the test suite
task test:v Verbose test output
task lint Ruff lint check
task lint:fix Auto-fix safe violations
task fmt Format with ruff
task fmt:check Check formatting (CI)
task check Full CI gate: fmt:check + lint + test
task fix lint:fix + fmt
task dev Start dev server with auto-reload
task openapi Export OpenAPI spec to docs/openapi.json
task build Build wheel
task example:shim:clean Wipe shim-integration demo state
task example:shim:server Start shim-integration server on :54321
task example:shim:demo Run shim-integration demo script

Running tests

task test
# or directly:
uv run pytest tests/ -v

The test suite uses pytest-asyncio in auto mode. All async test functions run in their own event loop.

Project layout

src/docvault/
├── __init__.py          # public API: DocVault, VaultConfig, load_config
├── config.py            # VaultConfig, load_config, AuthMode
├── exceptions.py        # DocVaultError hierarchy
├── api/
│   ├── __init__.py      # exports DocVaultShim
│   ├── app.py           # create_app (standalone FastAPI factory)
│   ├── auth.py          # build_auth_dep
│   ├── router.py        # create_router (all HTTP endpoints)
│   └── shim.py          # DocVaultShim (host-app integration)
└── core/
    ├── document.py      # Document, DocumentMeta, CreateDocInput, UpdateDocInput
    ├── vault_meta.py    # VaultMeta, VaultVersion
    ├── git_backend.py   # GitBackend (asyncio.to_thread wrapper)
    ├── store.py         # DocVault (main async store)
    ├── summarizer.py    # DocumentSummarizer (Anthropic API)
    ├── tools/
    │   └── deploy.py    # deploy_template (zip export → local filesystem)
    └── template.py      # Template, DocSlot, TemplateCreateInput,
                         # DeployVaultInput, create_id, create_id_from_structure

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_docvault-0.1.3.tar.gz (34.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_docvault-0.1.3-py3-none-any.whl (39.8 kB view details)

Uploaded Python 3

File details

Details for the file py_docvault-0.1.3.tar.gz.

File metadata

  • Download URL: py_docvault-0.1.3.tar.gz
  • Upload date:
  • Size: 34.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Pop!_OS","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for py_docvault-0.1.3.tar.gz
Algorithm Hash digest
SHA256 1fd0ccb9797159fd688bf02b1ff66b476ff942b6ecc9303d1ac5f64ec5dcf03e
MD5 efb799a15d4c00a39017b317a1beeb9a
BLAKE2b-256 efefc26ce8bcbdce9dc002f32f991bfbd54499cda2a88b5bb020ba79bb29c070

See more details on using hashes here.

File details

Details for the file py_docvault-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: py_docvault-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 39.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Pop!_OS","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for py_docvault-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 57c39fe355224d459e43a2c08fd64f16b1800c1738ce01b0ff32baacac86c16f
MD5 362a407a4cc75393f38fe79a61224f6d
BLAKE2b-256 bb597ee2145e55ea02e2d00ac42a61ae5ce750538eb59eb8b168b2a360d7aaeb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page