Vincio: the context engineering platform for AI applications. Compiles prompts, memory, retrieval, tools, schemas, and policies into optimized, validated, observable model-ready context packets.
Project description
The scarce resource is not the model. It is the context you feed it.
Vincio is a Python platform for building AI applications that you can trust in production. It takes everything that goes into a model (prompts, memory, retrieved evidence, tools, schemas, and policies) and compiles it into an optimized, validated, observable context packet; then it checks, measures, and traces everything that comes out. Named for Leonardo da Vinci, it pairs engineering and craft in equal measure.
Most libraries help you call a model. Vincio governs the boundary between your application and the model: what evidence is selected, how it is scored and budgeted, how the result is validated, and what it cost. It runs on your model of choice across every major provider, with batching, caching, failover, and cost tracking built in.
Why you'd reach for it, in one line each
- Runs on any model, offline when you want. Call OpenAI, Anthropic, Google, Mistral, a local model, or any OpenAI-compatible gateway through one interface. No key yet? A deterministic mock runs the whole pipeline (retrieval, validation, evals, traces) for dev, tests, and CI, with no network and no cost.
- Deterministic where it counts. Security, permissions, and validation are enforced in code, never gated on model output. The same input compiles to the same packet.
- Measured, not asserted. Every run is traced and costed; every change can be gated by an eval suite before it ships.
- One coherent system from input to output, not a bag of utilities you wire together yourself.
Contents
Install · Quickstart · What you can build · Providers · Features · Benchmarks · How Vincio compares · Examples · CLI · Architecture · Docs
Install
pip install vincio # core (the offline mock provider is built in)
pip install "vincio[openai]" # + OpenAI (also: anthropic, google, mistral)
pip install "vincio[chroma]" # + a vector store (also: pinecone, lancedb, pgvector, …)
pip install "vincio[all]" # every optional integration
Python 3.11+. The core depends only on pydantic, httpx, pyyaml, and typing-extensions;
every heavy integration (vector stores, OCR, server, OpenTelemetry, …) is an opt-in extra.
Quickstart
from vincio import ContextApp
app = ContextApp(name="docs_qa")
app.add_source("docs", path="./docs", retrieval="hybrid")
app.set_policy("answer_only_from_sources", True)
result = app.run("How do I configure SSO?")
print(result.output) # the grounded answer
print(result.citations) # the evidence it actually cited
print(result.trace_id) # every run produces a full trace
print(result.cost_usd) # …and a cost
To use a real model, set a provider and key, for example export VINCIO_PROVIDER=openai OPENAI_API_KEY=sk-..., or pass provider= and model= to ContextApp. The same code runs against
OpenAI, Anthropic, Google, Mistral, a local model, or any OpenAI-compatible gateway. No key yet? Out
of the box it runs on a deterministic mock that emits schema-valid output, so you can build and test
the whole pipeline offline in CI.
What you can build
Typed output you can rely on: declare a Pydantic schema, get a validated instance back:
from pydantic import BaseModel
from vincio import ContextApp
class Triage(BaseModel):
label: str
confidence: float
app = ContextApp(name="triage", output_schema=Triage)
app.run("The dashboard crashes after login").output.label # → a validated Triage
Agents with tools, memory, and hard budgets: permissioned tools, approval-gated writes, and a loop that cannot run away:
app = ContextApp(name="support", output_schema=RefundDecision)
app.add_memory(scope="user", strategy="semantic")
app.add_tool(lookup_order, permissions=["orders:read"])
app.add_tool(issue_refund, permissions=["refunds:write"], approval_required=True)
app.run("Refund my duplicate charge")
Evaluation as a CI gate: measure quality and block a regression before it ships:
from vincio import Dataset
from vincio.evals import EvalCase, EvalRunner
dataset = Dataset(name="golden", cases=[EvalCase(id="c1", input="…", expected="…")])
runner = EvalRunner(app, metrics=["groundedness", "citation_accuracy"],
gates={"groundedness": ">= 0.8"})
report = runner.run(dataset)
assert all(g["passed"] for g in report.gates.values()) # fail the build on a regression
See Examples for twelve complete, runnable programs that cover the whole platform.
Providers & models
Vincio calls real models in production. One interface routes to every major provider, with the model-operations layer (reasoning control, half-cost batch, caching, failover, cost tracking) built in. The deterministic mock is a development convenience, not the product: it lets you build and test the whole pipeline with no key and no cost before you point it at a real model.
Providers, model operations, and the mock
- Providers: OpenAI, Anthropic, Google (Gemini), Mistral, local models, and any OpenAI-compatible gateway (Groq, Together, Fireworks, OpenRouter, and the like) through one
ModelProviderinterface. - Enterprise auth: Amazon Bedrock, Google Vertex, and Azure OpenAI via pluggable auth strategies (SigV4, service-account, Azure AD / key).
- Model operations: unified reasoning/thinking control across providers, batch backends (~50% cost), prompt-cache strategy, a circuit breaker with health-aware failover, a key pool, and a data-driven
ModelRegistry(capabilities, pricing, lifecycle) that drives capability guards and shadow / canary dispatch. - The mock:
MockProvideris deterministic and emits schema-valid output, so the full pipeline (retrieval, validation, evals, traces, cost) runs offline in CI with no key and no cost. Use it for development and tests; use a real provider in production.
# point an app at a real model (or set VINCIO_PROVIDER / the API key in the environment)
app = ContextApp(name="docs_qa", provider="openai", model="gpt-4o-mini")
Features
Everything below is implemented, tested offline, and demonstrated by a runnable example. Use the
high-level ContextApp, or reach for any engine directly.
Every engine, in detail
Context & prompts
- Prompt compiler: typed prompt ASTs with
${variables}, lint rules, cache-aware stable-prefix layout, versioning, hashing, and diffing. - Context compiler: scores every candidate (relevance, novelty, authority, freshness, provenance, token cost, leakage risk), deduplicates, resolves conflicts, compresses, and packs to a token budget, with an excluded-context report explaining every omission.
- Tabular evidence: a typed, columnar
Datasetand a deterministicDataEncoderthat renders it header-once — schema, types, and units declared once, cells as delimited rows — lossless, columnar-accurate in token cost, and far cheaper thanjson.dumpsor a Markdown table;TableEvidencescores and cites it like any other evidence. - Dataset profiling & quality:
profile_datasetcomputes a deterministic, bounded-memory column profile (cardinality, percentiles, histograms, null rate, exemplars); reservoir/stratified sampling stands a representative sample in for the whole;fit_to_windowfits a table far larger than the window — profile plus sample — under a fixed token budget; andDataQualityRailsscreen for schema violations, constraint breaks, anomalies, and PII on the deterministic rail path.
Retrieval & memory
- Hybrid RAG: BM25 + dense + learned-sparse + late-interaction fused in one weighted RRF; query understanding (HyDE, multi-query, decomposition); sentence-window / auto-merging chunking; GraphRAG; structured metadata filters with tenant scope; text + image + table + video evidence as first-class scored candidates.
- Layered memory: session → episodic → semantic → tenant → graph, with a guarded write pipeline, confidence decay, contradiction resolution, bi-temporal recall, per-memory ACLs, and audited GDPR-style edit/forget/export.
Agents & orchestration
- Tools: permissioned registry (RBAC + ABAC), schema-from-typehints, a resource-limited sandbox, idempotent write guardrails with approval callbacks, and a grounded computer-use action plane.
- Agents: bounded DAG execution with planners (ReAct / plan-and-execute / hierarchical HTN), in-place plan repair, cost-aware action selection, and a budgeted deep-research agent.
- Orchestration: multi-agent crews with a shared blackboard, durable stateful graphs (checkpoint / resume / time-travel / human-in-the-loop), deterministic workflows, and a distributed durable-execution backend.
Output, evaluation & observability
- Structured output: Pydantic contracts, constrained decoding, streaming validation with early abort, bounded self-correction that repairs structure only (never invents facts), and DSPy-style typed signatures.
- Evaluation: golden datasets, 30+ metrics, deterministic / model / G-Eval judges, synthetic data, red-teaming, trajectory & tool-use scoring, drift detection, regression gates, and a
pytestplugin. - Observability: full trace span trees, OpenTelemetry export, a local trace viewer, a versioned prompt registry, and per-run cost tracking, no account or hosted backend required.
The closed loop
- Optimization: one reproducible cycle (trace → dataset → eval → optimize → promote): a reflective GEPA/MIPRO optimizer, a distillation flywheel, on-policy reinforcement from verifiable rewards, and gated deploy with canary + rollback. No promotion ships without clearing the gates.
Security & governance
- Security: deterministic PII / secret redaction (multilingual), prompt-injection defense and provable containment (taint tracking + capability tokens), RBAC / ABAC, tenant isolation, and a hash-chained, signed audit log with offline tamper verification.
- Governance: model / system cards, an OWASP / NIST / MITRE / ISO compliance matrix, an AI-BOM, provable erasure, a consent ledger, data-residency enforcement, formal invariant verification, agent identity & delegation, verified-reasoning certificates, and continuous assurance cases.
Interop
- Protocols: MCP (client and server), A2A agent-to-agent, and Agent Skills, all in-process.
- Ecosystem: import/export LangChain, LlamaIndex, Haystack, and DSPy assets; first-party data connectors; and any OpenAI-compatible model or vector store you already run.
Reach further: a cross-organization agent economy (negotiation, contracts, durable sagas, metering, settlement, arbitration, reputation, collateral & solvency proofs), an edge / WASM in-process runtime, on-device LoRA adaptation, federated learning with a differential-privacy accountant, and per-run energy / carbon accounting. See ROADMAP.md.
Benchmarks
Three suites ship in benchmarks/, all reproducible on your own machine. Every number
is measured live from both sides; a missing competitor is reported as skipped, never assumed.
Head-to-head vs. real libraries
competitive.py runs Vincio against the actual library a team would
otherwise use (Apple Silicon, Python 3.13; ratios are the portable signal, not wall-clock).
Show the full table
| Operation | Vincio | Competitor | Result |
|---|---|---|---|
| BM25 query @ 20k docs | BM25Index |
rank_bm25 |
~30–40× faster: identical top-1 ranking |
| Context assembly: tokens sent for the same retrieved set | context compiler | LangChain stuff / LlamaIndex compact |
~60% fewer tokens: answer retained |
| Tabular encoding: tokens for a 50×5 table | DataEncoder |
json.dumps / pandas.to_markdown / TOON |
~66% fewer tokens than json.dumps, lossless, typed schema |
| Fit a 5k-row table into the window | fit_to_window |
json.dumps all rows / pandas.describe |
~99% fewer tokens: profile + representative sample, size invariant to row count |
| Text chunking a 24k-word doc | chunk_document |
LangChain / LlamaIndex splitters | fastest, chunks carry provenance |
| Token counting (~60k words) | HeuristicTokenCounter |
tiktoken |
~1.4–1.8× faster, zero-dependency, conservative |
| Malformed-JSON recovery | lenient parser | stdlib json.loads |
4/8 vs 1/8 recovered |
| Render with a missing variable | PromptSpec.substitute |
jinja2 |
typed error vs. silently-empty render |
rank_bm25 rescans every document per query; Vincio's inverted index only scans documents
containing a query term, so its lead grows with corpus size. The point isn't that every component
beats every specialist: a dedicated JSON-repair library recovers more than Vincio (by guessing,
which is unsafe for typed extraction). Vincio's edge is an integrated, correct, governed
pipeline, not a pile of single-purpose libraries.
Orchestrator uplift: the same model, through Vincio
quality_uplift.py measures what routing a model through Vincio
adds versus calling it directly, against real models on 15 company-specific policy questions a model
cannot know from pretraining (4 models × 3 runs = 360 live calls, OpenRouter, June 2026).
Show the numbers and the honest read
Deterministic mechanism metrics (mechanical, so they hold for any model and run offline):
| Same model: direct vs. via Vincio | Direct | Via Vincio |
|---|---|---|
| Schema-valid object from realistic model outputs | 1/6 | 5/6 |
| Prompt-injection exfiltration via a tool call | compromised | contained |
| Context tokens to keep an early fact at 160 turns | 1,267 (lost) | 33 (retained) |
Grounded-answer quality on real models (mean over runs, stochastic by a point or two):
| Model: direct vs. through Vincio | Direct correct | Via Vincio correct | Direct hallucinated | Cost per correct answer |
|---|---|---|---|---|
openai/gpt-4o-mini |
2% | 100% | 64% | ~62× cheaper via Vincio |
anthropic/claude-3-haiku |
0% | 91% | 2%¹ | direct never correct (∞) |
google/gemini-2.5-flash-lite |
4% | 98% | 29% | ~67× cheaper via Vincio |
meta-llama/llama-3.1-8b-instruct |
2% | 89% | 40% | ~29× cheaper via Vincio |
| Aggregate | 2% | 95% | n/a | n/a |
¹ claude-3-haiku abstains (98% of the time) rather than guessing; better-aligned models say "I don't know," weaker ones confidently fabricate. Either way the model alone answers ~2%; the same model through Vincio's retrieval + grounding answers 89–100%, every answer cited.
The cost line is the honest punchline: a direct call is cheaper per call, but it answers almost
nothing correctly, so its cost per correct answer is 29–67× higher, or undefined when the model
gets nothing right on its own. Vincio is also faster per answer here (~1.3–1.6 s vs. ~1.7–2.5 s),
and token usage is roughly a wash. Full per-metric breakdown is in
benchmarks/README.md. Reproduce with VINCIO_PROVIDER=openrouter … python benchmarks/quality_uplift.py.
VincioBench: the internal regression suite
vinciobench.py is not a competitive claim: it is the deterministic
mechanism suite that gates CI. Its families assert that each engine still works on a bundled
synthetic corpus, so a regression fails the build. The scores saturate by design (a small corpus
built to exercise each mechanism), which proves the mechanism is intact, not real-world
performance. The credible performance evidence is the two sections above.
How Vincio compares
Each ecosystem below is strong in its focus area. This reflects built-in, in-library capability, not what's reachable by adding a separate product or SaaS.
Show the full matrix
| Capability | Vincio | LangChain | LlamaIndex | DSPy | Ragas |
|---|---|---|---|---|---|
| Scored, budgeted context compiler | ✅ | ➖ | ➖ | ❌ | ❌ |
| Sparse + late-interaction + GraphRAG in one fusion | ✅ | ➖ | ➖ | ❌ | ❌ |
| Layered memory (decay, conflicts, bi-temporal) | ✅ | ➖ | ➖ | ❌ | ❌ |
| Permissioned tool registry (RBAC/ABAC) | ✅ | ❌ | ❌ | ❌ | ❌ |
| Durable graphs + bounded crews | ✅ | ➖ | ❌ | ❌ | ❌ |
| Structured output + structure-only repair | ✅ | ➖ | ➖ | ✅ | ❌ |
| Built-in evals + CI gates | ✅ | ➖ | ➖ | ➖ | ✅ |
| Eval-driven optimization (gated promotion) | ✅ | ❌ | ❌ | ✅ | ❌ |
| Native tracing + cost, no account | ✅ | ➖ | ➖ | ❌ | ❌ |
| Deterministic security (PII / injection / audit) | ✅ | ❌ | ❌ | ❌ | ❌ |
| MCP client and server + A2A + Skills | ✅ | ➖ | ➖ | ➖ | ❌ |
| Governance evidence (cards · AI-BOM · erasure · residency) | ✅ | ❌ | ❌ | ❌ | ❌ |
✅ first-class in-library · ➖ partial or via an add-on/SaaS · ❌ not a focus. Ecosystems evolve, and
Vincio is built to interoperate: vincio.interop brings LangChain, LlamaIndex, Haystack, and DSPy
assets in (and hands Vincio's back). See the in-depth write-ups in
docs/comparisons/.
Examples
Twelve complete, heavily-commented programs in examples/; each runs fully offline
and teaches a whole theme end to end.
| # | Example | What it covers |
|---|---|---|
| 01 | quickstart |
typed output · grounded QA with citations · trace & cost · a short conversation |
| 02 | retrieval_rag |
hybrid + sparse + late-interaction fusion · query understanding · GraphRAG · multimodal evidence |
| 03 | memory |
scoped remember/recall · bi-temporal · decay & contradictions · GDPR forget/export |
| 04 | agents_and_tools |
permissioned tools · sandbox · planners · plan repair · deep research · computer-use |
| 05 | orchestration |
crews + blackboard · durable graphs · workflows · distributed execution |
| 06 | structured_output |
contracts · constrained decoding · streaming validation · self-correction · signatures |
| 07 | evaluation_observability |
datasets · metrics · judges · red-team · drift · tracing · prompt registry |
| 08 | optimization_self_improvement |
the closed loop · reflective optimizer · RLVR · canary deploy · local & federated adaptation |
| 09 | security_governance |
PII/injection/containment · audit · governance evidence · identity · verified reasoning · assurance |
| 10 | interop_and_protocols |
MCP client+server · A2A · Agent Skills · framework interop · connectors · packs |
| 11 | advanced_context |
reasoning control · test-time compute · long-horizon · world-model · semantic cache · record-replay |
| 12 | cross_org_economy |
negotiation · contracts · durable sagas · settlement · arbitration · solvency proofs |
| 13 | tabular_evidence |
typed columnar Dataset · the compact, lossless DataEncoder · columnar token cost · TableEvidence in the compiler |
| 14 | dataset_profiling |
profile_dataset · reservoir/stratified sampling · fit_to_window under a token budget · DataQualityRails screening |
cd examples && python 01_quickstart.py # offline, no keys
export VINCIO_PROVIDER=openai OPENAI_API_KEY=sk-... && python 01_quickstart.py # against a real model
Command line
vincio init my-project --template rag # scaffold config + app + golden set
vincio run app.py --input "..." # run an app
vincio eval run golden.jsonl # run an eval suite with CI gates + baseline compare
vincio trace view trace_123 # TUI trace tree with scores + feedback
vincio optimize run --target groundedness
vincio loop run --app app.py --gate groundedness=">= 0.8" # one closed-loop cycle
vincio audit verify # verify the audit-log hash chain offline
vincio mcp serve app.py # expose an app as an MCP server
vincio serve --app app.py # launch the HTTP API (health/readiness/metrics)
The full CLI is in the CLI reference. vincio serve launches a FastAPI
server (API-key + JWT auth, SSE streaming, Prometheus metrics); from vincio.server import create_app embeds it.
Architecture
One coherent pipeline from raw input to traced, validated result: the input engine normalizes and scopes the request; memory, retrieval, tools, and the prompt compiler all feed the context compiler, which scores, deduplicates, resolves conflicts, compresses, and budgets; the model runs provider-neutral; and every output is validated, evaluated, secured, traced, costed, and written back to memory.
See AGENTS.md for the package layout and docs/concepts/ for a tour
of each engine.
Status
Vincio 4.0 is feature-complete and in long-term support. The public API is frozen under
Semantic Versioning with a mechanical
deprecation policy; performance and quality targets are
published as SLOs and gated by VincioBench; releases ship a CycloneDX SBOM
with SLSA provenance. New capabilities are added behind opt-in extras, never by breaking working
code. See ROADMAP.md and MIGRATION.md.
Vincio is, and stays, a library. The building blocks for production (audit chain, retention, tenant isolation, RBAC/ABAC, a server) ship in the package for you to deploy on your own infrastructure. There is no hosted service.
Documentation
The documentation index maps every guide, concept, and reference page in a reading order. Highlights:
- Getting started: install, your first app, offline development
- Concepts: context packets · prompt compiler · memory · retrieval · agents & workflows · evaluation · observability
- Guides: build a RAG app · structured output · add tools · orchestrate multi-agent systems · run evals · close the loop · performance & streaming · integrations
- Protocols: MCP client + server · A2A · Agent Skills · reasoning control
- Migrating: from LangChain · LlamaIndex · Ragas
- Security & governance: threat model · security policy · governance & compliance
- Reference: API · CLI · config · SLOs · stability & deprecation
- Comparisons: LangChain · LlamaIndex · DSPy · CrewAI · Ragas · and more
Contributing
Contributions are welcome. The test suite runs fully offline and must stay green:
pip install -e ".[dev]"
python -m pytest -q # 5858 tests, no network or API keys required
ruff check vincio/ tests/
mypy vincio
See AGENTS.md for the codebase layout and engineering conventions.
License
Apache License 2.0 © Vincio Contributors.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vincio-4.2.0.tar.gz.
File metadata
- Download URL: vincio-4.2.0.tar.gz
- Upload date:
- Size: 2.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
accbfed966603ddf7b2ccff8f4420f9419183c63e2451aa9db6e62cad7be6ee4
|
|
| MD5 |
d507ea0affbd6d0dce77ab2c36f40712
|
|
| BLAKE2b-256 |
2897842f410be90b668d59d700ebdb05a191ee5fa362141fdccd74ec35c0fe35
|
Provenance
The following attestation bundles were made for vincio-4.2.0.tar.gz:
Publisher:
release.yml on Ohswedd/vincio
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vincio-4.2.0.tar.gz -
Subject digest:
accbfed966603ddf7b2ccff8f4420f9419183c63e2451aa9db6e62cad7be6ee4 - Sigstore transparency entry: 1972274350
- Sigstore integration time:
-
Permalink:
Ohswedd/vincio@3c11ed444650852849dcb21e6be0922441c10f0c -
Branch / Tag:
refs/tags/v4.2.0 - Owner: https://github.com/Ohswedd
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3c11ed444650852849dcb21e6be0922441c10f0c -
Trigger Event:
release
-
Statement type:
File details
Details for the file vincio-4.2.0-py3-none-any.whl.
File metadata
- Download URL: vincio-4.2.0-py3-none-any.whl
- Upload date:
- Size: 1.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c629862089f2356c81b37af63803a0474b84611c7349af191095f840443d9a4c
|
|
| MD5 |
a8912e766816ce759913e145892cd86d
|
|
| BLAKE2b-256 |
0f78b58467661429d072947c055614855f5edbb7e362a8d6f3081cf6a687baa9
|
Provenance
The following attestation bundles were made for vincio-4.2.0-py3-none-any.whl:
Publisher:
release.yml on Ohswedd/vincio
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vincio-4.2.0-py3-none-any.whl -
Subject digest:
c629862089f2356c81b37af63803a0474b84611c7349af191095f840443d9a4c - Sigstore transparency entry: 1972274463
- Sigstore integration time:
-
Permalink:
Ohswedd/vincio@3c11ed444650852849dcb21e6be0922441c10f0c -
Branch / Tag:
refs/tags/v4.2.0 - Owner: https://github.com/Ohswedd
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3c11ed444650852849dcb21e6be0922441c10f0c -
Trigger Event:
release
-
Statement type: