Skip to main content

Vincio: the context engineering platform for AI applications. Compiles prompts, memory, retrieval, tools, schemas, and policies into optimized, validated, observable model-ready context packets.

Project description

Vincio: the context engineering platform for AI applications

The scarce resource is not the model. It is the context you feed it.

Vincio 4.0.0 CI Python 3.11+ Apache 2.0 5858 tests passing Providers: OpenAI, Anthropic, Google, Mistral, local, and OpenAI-compatible gateways


Vincio is a Python platform for building AI applications that you can trust in production. It takes everything that goes into a model (prompts, memory, retrieved evidence, tools, schemas, and policies) and compiles it into an optimized, validated, observable context packet; then it checks, measures, and traces everything that comes out. Named for Leonardo da Vinci, it pairs engineering and craft in equal measure.

The run pipeline, governed end to end: raw input, normalize, redact and gate, retrieve and rank, compile context, call model, parse and validate, evaluate and guard, trace and cost, learn; with a governance layer across the whole run (policy and rails, PII redaction, injection defense, audit chain, EU AI Act, residency, cross-org)

Most libraries help you call a model. Vincio governs the boundary between your application and the model: what evidence is selected, how it is scored and budgeted, how the result is validated, and what it cost. It runs on your model of choice across every major provider, with batching, caching, failover, and cost tracking built in.

Why Vincio: offline dev and CI (deterministic mock, no key, no cost); deterministic (security and validation in code, not model output); measured (every run traced and costed, eval-gated); one system (input to output, not a bag of utilities)

Why you'd reach for it, in one line each
  • Runs on any model, offline when you want. Call OpenAI, Anthropic, Google, Mistral, a local model, or any OpenAI-compatible gateway through one interface. No key yet? A deterministic mock runs the whole pipeline (retrieval, validation, evals, traces) for dev, tests, and CI, with no network and no cost.
  • Deterministic where it counts. Security, permissions, and validation are enforced in code, never gated on model output. The same input compiles to the same packet.
  • Measured, not asserted. Every run is traced and costed; every change can be gated by an eval suite before it ships.
  • One coherent system from input to output, not a bag of utilities you wire together yourself.

Contents

Install · Quickstart · What you can build · Providers · Features · Benchmarks · How Vincio compares · Examples · CLI · Architecture · Docs

Install

pip install vincio                  # core (the offline mock provider is built in)
pip install "vincio[openai]"        # + OpenAI    (also: anthropic, google, mistral)
pip install "vincio[chroma]"        # + a vector store (also: pinecone, lancedb, pgvector, …)
pip install "vincio[all]"           # every optional integration

Python 3.11+. The core depends only on pydantic, httpx, pyyaml, and typing-extensions; every heavy integration (vector stores, OCR, server, OpenTelemetry, …) is an opt-in extra.

Quickstart

from vincio import ContextApp

app = ContextApp(name="docs_qa")
app.add_source("docs", path="./docs", retrieval="hybrid")
app.set_policy("answer_only_from_sources", True)

result = app.run("How do I configure SSO?")
print(result.output)      # the grounded answer
print(result.citations)   # the evidence it actually cited
print(result.trace_id)    # every run produces a full trace
print(result.cost_usd)    # …and a cost

To use a real model, set a provider and key, for example export VINCIO_PROVIDER=openai OPENAI_API_KEY=sk-..., or pass provider= and model= to ContextApp. The same code runs against OpenAI, Anthropic, Google, Mistral, a local model, or any OpenAI-compatible gateway. No key yet? Out of the box it runs on a deterministic mock that emits schema-valid output, so you can build and test the whole pipeline offline in CI.

What you can build

Typed output you can rely on: declare a Pydantic schema, get a validated instance back:

from pydantic import BaseModel
from vincio import ContextApp

class Triage(BaseModel):
    label: str
    confidence: float

app = ContextApp(name="triage", output_schema=Triage)
app.run("The dashboard crashes after login").output.label   # → a validated Triage

Agents with tools, memory, and hard budgets: permissioned tools, approval-gated writes, and a loop that cannot run away:

app = ContextApp(name="support", output_schema=RefundDecision)
app.add_memory(scope="user", strategy="semantic")
app.add_tool(lookup_order, permissions=["orders:read"])
app.add_tool(issue_refund, permissions=["refunds:write"], approval_required=True)
app.run("Refund my duplicate charge")

Evaluation as a CI gate: measure quality and block a regression before it ships:

from vincio import Dataset
from vincio.evals import EvalCase, EvalRunner

dataset = Dataset(name="golden", cases=[EvalCase(id="c1", input="…", expected="…")])
runner = EvalRunner(app, metrics=["groundedness", "citation_accuracy"],
                    gates={"groundedness": ">= 0.8"})
report = runner.run(dataset)
assert all(g["passed"] for g in report.gates.values())   # fail the build on a regression

See Examples for twelve complete, runnable programs that cover the whole platform.

Providers & models

Vincio calls real models in production. One interface routes to every major provider, with the model-operations layer (reasoning control, half-cost batch, caching, failover, cost tracking) built in. The deterministic mock is a development convenience, not the product: it lets you build and test the whole pipeline with no key and no cost before you point it at a real model.

Providers and models: one interface over OpenAI, Anthropic, Google, Mistral, local models, and any OpenAI-compatible gateway, plus enterprise auth for Amazon Bedrock, Google Vertex, and Azure OpenAI. Model operations: unified reasoning control, batch at about half cost, prompt caching, circuit breaker and failover, key pool, and per-run cost tracking. With no key, a deterministic mock runs the whole pipeline for dev, tests, and CI.

Providers, model operations, and the mock
  • Providers: OpenAI, Anthropic, Google (Gemini), Mistral, local models, and any OpenAI-compatible gateway (Groq, Together, Fireworks, OpenRouter, and the like) through one ModelProvider interface.
  • Enterprise auth: Amazon Bedrock, Google Vertex, and Azure OpenAI via pluggable auth strategies (SigV4, service-account, Azure AD / key).
  • Model operations: unified reasoning/thinking control across providers, batch backends (~50% cost), prompt-cache strategy, a circuit breaker with health-aware failover, a key pool, and a data-driven ModelRegistry (capabilities, pricing, lifecycle) that drives capability guards and shadow / canary dispatch.
  • The mock: MockProvider is deterministic and emits schema-valid output, so the full pipeline (retrieval, validation, evals, traces, cost) runs offline in CI with no key and no cost. Use it for development and tests; use a real provider in production.
# point an app at a real model (or set VINCIO_PROVIDER / the API key in the environment)
app = ContextApp(name="docs_qa", provider="openai", model="gpt-4o-mini")

Features

Everything below is implemented, tested offline, and demonstrated by a runnable example. Use the high-level ContextApp, or reach for any engine directly.

One platform, every layer: context and prompts; retrieval and memory; agents and orchestration; output and evaluation; the closed loop; security and governance; protocols and interop; cross-org economy, edge and federated reach

Every engine, in detail

Context & prompts

  • Prompt compiler: typed prompt ASTs with ${variables}, lint rules, cache-aware stable-prefix layout, versioning, hashing, and diffing.
  • Context compiler: scores every candidate (relevance, novelty, authority, freshness, provenance, token cost, leakage risk), deduplicates, resolves conflicts, compresses, and packs to a token budget, with an excluded-context report explaining every omission.
  • Tabular evidence: a typed, columnar Dataset and a deterministic DataEncoder that renders it header-once — schema, types, and units declared once, cells as delimited rows — lossless, columnar-accurate in token cost, and far cheaper than json.dumps or a Markdown table; TableEvidence scores and cites it like any other evidence.

Retrieval & memory

  • Hybrid RAG: BM25 + dense + learned-sparse + late-interaction fused in one weighted RRF; query understanding (HyDE, multi-query, decomposition); sentence-window / auto-merging chunking; GraphRAG; structured metadata filters with tenant scope; text + image + table + video evidence as first-class scored candidates.
  • Layered memory: session → episodic → semantic → tenant → graph, with a guarded write pipeline, confidence decay, contradiction resolution, bi-temporal recall, per-memory ACLs, and audited GDPR-style edit/forget/export.

Agents & orchestration

  • Tools: permissioned registry (RBAC + ABAC), schema-from-typehints, a resource-limited sandbox, idempotent write guardrails with approval callbacks, and a grounded computer-use action plane.
  • Agents: bounded DAG execution with planners (ReAct / plan-and-execute / hierarchical HTN), in-place plan repair, cost-aware action selection, and a budgeted deep-research agent.
  • Orchestration: multi-agent crews with a shared blackboard, durable stateful graphs (checkpoint / resume / time-travel / human-in-the-loop), deterministic workflows, and a distributed durable-execution backend.

Output, evaluation & observability

  • Structured output: Pydantic contracts, constrained decoding, streaming validation with early abort, bounded self-correction that repairs structure only (never invents facts), and DSPy-style typed signatures.
  • Evaluation: golden datasets, 30+ metrics, deterministic / model / G-Eval judges, synthetic data, red-teaming, trajectory & tool-use scoring, drift detection, regression gates, and a pytest plugin.
  • Observability: full trace span trees, OpenTelemetry export, a local trace viewer, a versioned prompt registry, and per-run cost tracking, no account or hosted backend required.

The closed loop

  • Optimization: one reproducible cycle (trace → dataset → eval → optimize → promote): a reflective GEPA/MIPRO optimizer, a distillation flywheel, on-policy reinforcement from verifiable rewards, and gated deploy with canary + rollback. No promotion ships without clearing the gates.

Security & governance

  • Security: deterministic PII / secret redaction (multilingual), prompt-injection defense and provable containment (taint tracking + capability tokens), RBAC / ABAC, tenant isolation, and a hash-chained, signed audit log with offline tamper verification.
  • Governance: model / system cards, an OWASP / NIST / MITRE / ISO compliance matrix, an AI-BOM, provable erasure, a consent ledger, data-residency enforcement, formal invariant verification, agent identity & delegation, verified-reasoning certificates, and continuous assurance cases.

Interop

  • Protocols: MCP (client and server), A2A agent-to-agent, and Agent Skills, all in-process.
  • Ecosystem: import/export LangChain, LlamaIndex, Haystack, and DSPy assets; first-party data connectors; and any OpenAI-compatible model or vector store you already run.

Reach further: a cross-organization agent economy (negotiation, contracts, durable sagas, metering, settlement, arbitration, reputation, collateral & solvency proofs), an edge / WASM in-process runtime, on-device LoRA adaptation, federated learning with a differential-privacy accountant, and per-run energy / carbon accounting. See ROADMAP.md.

Benchmarks

Three suites ship in benchmarks/, all reproducible on your own machine. Every number is measured live from both sides; a missing competitor is reported as skipped, never assumed.

Head-to-head vs. real libraries

competitive.py runs Vincio against the actual library a team would otherwise use (Apple Silicon, Python 3.13; ratios are the portable signal, not wall-clock).

Head-to-head vs. real libraries: 30 to 40 times faster BM25 at 20k docs vs rank_bm25; 60 percent fewer tokens for the same answer vs LangChain and LlamaIndex; 1.4 to 1.8 times faster token counting vs tiktoken; 4 of 8 vs 1 of 8 malformed JSON recovered vs stdlib json

Show the full table
Operation Vincio Competitor Result
BM25 query @ 20k docs BM25Index rank_bm25 ~30–40× faster: identical top-1 ranking
Context assembly: tokens sent for the same retrieved set context compiler LangChain stuff / LlamaIndex compact ~60% fewer tokens: answer retained
Tabular encoding: tokens for a 50×5 table DataEncoder json.dumps / pandas.to_markdown / TOON ~66% fewer tokens than json.dumps, lossless, typed schema
Text chunking a 24k-word doc chunk_document LangChain / LlamaIndex splitters fastest, chunks carry provenance
Token counting (~60k words) HeuristicTokenCounter tiktoken ~1.4–1.8× faster, zero-dependency, conservative
Malformed-JSON recovery lenient parser stdlib json.loads 4/8 vs 1/8 recovered
Render with a missing variable PromptSpec.substitute jinja2 typed error vs. silently-empty render

rank_bm25 rescans every document per query; Vincio's inverted index only scans documents containing a query term, so its lead grows with corpus size. The point isn't that every component beats every specialist: a dedicated JSON-repair library recovers more than Vincio (by guessing, which is unsafe for typed extraction). Vincio's edge is an integrated, correct, governed pipeline, not a pile of single-purpose libraries.

Orchestrator uplift: the same model, through Vincio

quality_uplift.py measures what routing a model through Vincio adds versus calling it directly, against real models on 15 company-specific policy questions a model cannot know from pretraining (4 models × 3 runs = 360 live calls, OpenRouter, June 2026).

Grounded-answer accuracy, direct vs. through Vincio: gpt-4o-mini 2 to 100 percent; claude-3-haiku 0 to 91 percent; gemini-2.5-flash-lite 4 to 98 percent; llama-3.1-8b 2 to 89 percent; aggregate 2 to 95 percent

Show the numbers and the honest read

Deterministic mechanism metrics (mechanical, so they hold for any model and run offline):

Same model: direct vs. via Vincio Direct Via Vincio
Schema-valid object from realistic model outputs 1/6 5/6
Prompt-injection exfiltration via a tool call compromised contained
Context tokens to keep an early fact at 160 turns 1,267 (lost) 33 (retained)

Grounded-answer quality on real models (mean over runs, stochastic by a point or two):

Model: direct vs. through Vincio Direct correct Via Vincio correct Direct hallucinated Cost per correct answer
openai/gpt-4o-mini 2% 100% 64% ~62× cheaper via Vincio
anthropic/claude-3-haiku 0% 91% 2%¹ direct never correct (∞)
google/gemini-2.5-flash-lite 4% 98% 29% ~67× cheaper via Vincio
meta-llama/llama-3.1-8b-instruct 2% 89% 40% ~29× cheaper via Vincio
Aggregate 2% 95% n/a n/a

¹ claude-3-haiku abstains (98% of the time) rather than guessing; better-aligned models say "I don't know," weaker ones confidently fabricate. Either way the model alone answers ~2%; the same model through Vincio's retrieval + grounding answers 89–100%, every answer cited.

The cost line is the honest punchline: a direct call is cheaper per call, but it answers almost nothing correctly, so its cost per correct answer is 29–67× higher, or undefined when the model gets nothing right on its own. Vincio is also faster per answer here (~1.3–1.6 s vs. ~1.7–2.5 s), and token usage is roughly a wash. Full per-metric breakdown is in benchmarks/README.md. Reproduce with VINCIO_PROVIDER=openrouter … python benchmarks/quality_uplift.py.

VincioBench: the internal regression suite

vinciobench.py is not a competitive claim: it is the deterministic mechanism suite that gates CI. Its families assert that each engine still works on a bundled synthetic corpus, so a regression fails the build. The scores saturate by design (a small corpus built to exercise each mechanism), which proves the mechanism is intact, not real-world performance. The credible performance evidence is the two sections above.

How Vincio compares

Each ecosystem below is strong in its focus area. This reflects built-in, in-library capability, not what's reachable by adding a separate product or SaaS.

Capability matrix comparing Vincio, LangChain, LlamaIndex, DSPy, and Ragas across twelve capabilities including the scored context compiler, sparse and late-interaction and GraphRAG fusion, layered memory, permissioned tools, durable graphs, structure-only repair, built-in evals and CI gates, eval-driven optimization, native tracing and cost, deterministic security, MCP and A2A and Skills, and governance evidence. Vincio is first-class across all twelve.

Show the full matrix
Capability Vincio LangChain LlamaIndex DSPy Ragas
Scored, budgeted context compiler
Sparse + late-interaction + GraphRAG in one fusion
Layered memory (decay, conflicts, bi-temporal)
Permissioned tool registry (RBAC/ABAC)
Durable graphs + bounded crews
Structured output + structure-only repair
Built-in evals + CI gates
Eval-driven optimization (gated promotion)
Native tracing + cost, no account
Deterministic security (PII / injection / audit)
MCP client and server + A2A + Skills
Governance evidence (cards · AI-BOM · erasure · residency)

✅ first-class in-library · ➖ partial or via an add-on/SaaS · ❌ not a focus. Ecosystems evolve, and Vincio is built to interoperate: vincio.interop brings LangChain, LlamaIndex, Haystack, and DSPy assets in (and hands Vincio's back). See the in-depth write-ups in docs/comparisons/.

Examples

Twelve complete, heavily-commented programs in examples/; each runs fully offline and teaches a whole theme end to end.

# Example What it covers
01 quickstart typed output · grounded QA with citations · trace & cost · a short conversation
02 retrieval_rag hybrid + sparse + late-interaction fusion · query understanding · GraphRAG · multimodal evidence
03 memory scoped remember/recall · bi-temporal · decay & contradictions · GDPR forget/export
04 agents_and_tools permissioned tools · sandbox · planners · plan repair · deep research · computer-use
05 orchestration crews + blackboard · durable graphs · workflows · distributed execution
06 structured_output contracts · constrained decoding · streaming validation · self-correction · signatures
07 evaluation_observability datasets · metrics · judges · red-team · drift · tracing · prompt registry
08 optimization_self_improvement the closed loop · reflective optimizer · RLVR · canary deploy · local & federated adaptation
09 security_governance PII/injection/containment · audit · governance evidence · identity · verified reasoning · assurance
10 interop_and_protocols MCP client+server · A2A · Agent Skills · framework interop · connectors · packs
11 advanced_context reasoning control · test-time compute · long-horizon · world-model · semantic cache · record-replay
12 cross_org_economy negotiation · contracts · durable sagas · settlement · arbitration · solvency proofs
13 tabular_evidence typed columnar Dataset · the compact, lossless DataEncoder · columnar token cost · TableEvidence in the compiler
cd examples && python 01_quickstart.py            # offline, no keys
export VINCIO_PROVIDER=openai OPENAI_API_KEY=sk-... && python 01_quickstart.py   # against a real model

Command line

vincio init my-project --template rag   # scaffold config + app + golden set
vincio run app.py --input "..."         # run an app
vincio eval run golden.jsonl            # run an eval suite with CI gates + baseline compare
vincio trace view trace_123             # TUI trace tree with scores + feedback
vincio optimize run --target groundedness
vincio loop run --app app.py --gate groundedness=">= 0.8"   # one closed-loop cycle
vincio audit verify                     # verify the audit-log hash chain offline
vincio mcp serve app.py                 # expose an app as an MCP server
vincio serve --app app.py               # launch the HTTP API (health/readiness/metrics)

The full CLI is in the CLI reference. vincio serve launches a FastAPI server (API-key + JWT auth, SSE streaming, Prometheus metrics); from vincio.server import create_app embeds it.

Architecture

One coherent pipeline from raw input to traced, validated result: the input engine normalizes and scopes the request; memory, retrieval, tools, and the prompt compiler all feed the context compiler, which scores, deduplicates, resolves conflicts, compresses, and budgets; the model runs provider-neutral; and every output is validated, evaluated, secured, traced, costed, and written back to memory.

Vincio architecture: the input engine feeds the context compiler, which is also fed by memory, retrieval, tools, and the prompt compiler; the context compiler feeds provider-neutral model execution; the output is validated, evaluated, secured, traced, costed, and written back to memory

See AGENTS.md for the package layout and docs/concepts/ for a tour of each engine.

Status

Vincio 4.0 is feature-complete and in long-term support. The public API is frozen under Semantic Versioning with a mechanical deprecation policy; performance and quality targets are published as SLOs and gated by VincioBench; releases ship a CycloneDX SBOM with SLSA provenance. New capabilities are added behind opt-in extras, never by breaking working code. See ROADMAP.md and MIGRATION.md.

Vincio is, and stays, a library. The building blocks for production (audit chain, retention, tenant isolation, RBAC/ABAC, a server) ship in the package for you to deploy on your own infrastructure. There is no hosted service.

Documentation

The documentation index maps every guide, concept, and reference page in a reading order. Highlights:

Contributing

Contributions are welcome. The test suite runs fully offline and must stay green:

pip install -e ".[dev]"
python -m pytest -q          # 5858 tests, no network or API keys required
ruff check vincio/ tests/
mypy vincio

See AGENTS.md for the codebase layout and engineering conventions.

License

Apache License 2.0 © Vincio Contributors.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vincio-4.1.0.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vincio-4.1.0-py3-none-any.whl (1.6 MB view details)

Uploaded Python 3

File details

Details for the file vincio-4.1.0.tar.gz.

File metadata

  • Download URL: vincio-4.1.0.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vincio-4.1.0.tar.gz
Algorithm Hash digest
SHA256 b0dba1e46f89be6be831947e318472e60a51c779c8696129a5be3a3db53e774f
MD5 805b5ce5022921f447065347587dee71
BLAKE2b-256 162cf2a45b9976969631df725d15665f52ea5a7a003ac52e17c34372b21ded41

See more details on using hashes here.

Provenance

The following attestation bundles were made for vincio-4.1.0.tar.gz:

Publisher: release.yml on Ohswedd/vincio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vincio-4.1.0-py3-none-any.whl.

File metadata

  • Download URL: vincio-4.1.0-py3-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vincio-4.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b4be3f8e1d29f813ec8dba03b3a3de6d1a6ba4decb350ef389cfcc70152c109d
MD5 ae6f5cb3a998a6b1a2b0cc7b013d6554
BLAKE2b-256 e78779606c616a50f618b7299c78d049beba94b11778f6537973fda5607cc182

See more details on using hashes here.

Provenance

The following attestation bundles were made for vincio-4.1.0-py3-none-any.whl:

Publisher: release.yml on Ohswedd/vincio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page