Verifiable integrity for AI embedding stores.

These details have not been verified by PyPI

Project links

Project description

VectorPin

Verifiable integrity for AI embedding stores.

Vector databases are the new soft underbelly of the AI stack. Models trust them. Agents query them. Compliance audits don't yet ask about them. VectorPin pins every embedding to its source content and the model that produced it, then continuously verifies the store has not been tampered with — including covert steganographic modifications invisible to traditional DLP.

Part of the ThirdKey Trust Stack, alongside Symbiont (policy-governed agent runtime) and SchemaPin (cryptographic tool verification).

Why this matters

Modern RAG systems convert sensitive content into high-dimensional vectors and store them in databases that:

Don't inspect what gets written
Don't verify integrity on read
Treat embeddings as opaque numerical artifacts

That's a giant attack surface. The companion VectorSmuggle research project demonstrates that an attacker with write access to a vector pipeline can hide arbitrary data inside embeddings using techniques that pass standard observability:

Noise injection, rotation, scaling, and offset perturbations
Cross-model fragmentation
Steganographic encoding that survives database quantization

Cryptographic pinning is the kill shot for these attacks. Every steganographic technique requires modifying the vector after the model produces it. If each vector ships with a signed attestation binding it to its source text and the producing model, any modification breaks the signature.

Quick start

Python

pip install vectorpin

import numpy as np
from vectorpin import Signer, Verifier

# At ingestion time
signer = Signer.generate(key_id="prod-2026-05")
embedding = my_model.embed("The quick brown fox.")
pin = signer.pin(
    source="The quick brown fox.",
    model="text-embedding-3-large",
    vector=embedding,
)
# Store pin.to_json() alongside the embedding in your vector DB metadata.

# At read/audit time
verifier = Verifier({"prod-2026-05": signer.public_key_bytes()})
result = verifier.verify(pin, source="The quick brown fox.", vector=embedding)
if not result.ok:
    print(f"INTEGRITY FAILURE: {result.error.value} — {result.detail}")

Rust

[dependencies]
vectorpin = "0.1"

use vectorpin::{Signer, Verifier};

let signer = Signer::generate("prod-2026-05".to_string());
let embedding: Vec<f32> = my_model_embed("The quick brown fox.");
let pin = signer.pin(
    "The quick brown fox.",
    "text-embedding-3-large",
    embedding.as_slice(),
)?;

let mut verifier = Verifier::new();
verifier.add_key(signer.key_id(), signer.public_key_bytes());

let result = verifier.verify_full::<&[f32]>(
    &pin,
    Some("The quick brown fox."),
    Some(embedding.as_slice()),
    None,
);
assert!(result.is_ok());

TypeScript / JavaScript

npm install vectorpin

import { Signer, Verifier } from 'vectorpin';

const signer = Signer.generate('prod-2026-05');
const embedding = new Float32Array(/* ... 3072 floats from your model ... */);
const pin = signer.pin({
  source: 'The quick brown fox.',
  model: 'text-embedding-3-large',
  vector: embedding,
});

const verifier = new Verifier({ [signer.keyId]: signer.publicKeyBytes() });
const result = verifier.verify(pin, {
  source: 'The quick brown fox.',
  vector: embedding,
});
if (!result.ok) throw new Error(`integrity failure: ${result.error}`);

The Python, Rust, and TypeScript implementations are byte-for-byte compatible. A pin produced by any of them verifies on the other two, enforced by shared test vectors at testvectors/v1.json consumed in all three test suites. The TS port is pure JavaScript via @noble/ed25519 and @noble/hashes, so it also runs in Deno, Bun, and edge runtimes.

What VectorPin guarantees

Each Pin commits to:

The source text, by SHA-256 of UTF-8 NFC-normalized bytes.
The model, by identifier (and optionally by content hash).
The vector itself, by SHA-256 of canonical little-endian bytes.
The producer, by Ed25519 signing key.
The time, by RFC 3339 timestamp.

Verification distinguishes failure modes so callers can route them differently:

Outcome	Meaning
`OK`	Signature valid, vector intact, source matches.
`SIGNATURE_INVALID`	Pin was forged or re-signed by an attacker.
`VECTOR_TAMPERED`	Embedding modified after pinning. This is the steganography kill shot.
`SOURCE_MISMATCH`	Source text differs from what was pinned.
`MODEL_MISMATCH`	Pin was produced by a different embedding model than expected.
`UNKNOWN_KEY`	Pin signed by a key not in the verifier's registry.
`SHAPE_MISMATCH` / `UNSUPPORTED_VERSION`	Structural problems with the data.

CLI

# Generate a signing key pair
vectorpin keygen --key-id prod-2026-05 --output ./keys

# Pin a single (text, vector) pair (debug/demo)
vectorpin pin \
    --private-key ./keys/prod-2026-05.priv \
    --key-id prod-2026-05 \
    --model text-embedding-3-large \
    --source ./doc.txt \
    --vector ./embedding.npy

# Verify a pin
vectorpin verify-pin \
    --public-key ./keys/prod-2026-05.pub \
    --key-id prod-2026-05 \
    --pin ./pin.json \
    --source ./doc.txt \
    --vector ./embedding.npy

# Audit an entire Qdrant collection
vectorpin audit-qdrant \
    --url http://localhost:6333 \
    --collection my-rag \
    --public-key ./keys/prod-2026-05.pub \
    --key-id prod-2026-05

Vector store integrations

Backend	Status	Install
LanceDB (default)	Alpha	`pip install 'vectorpin[default]'`
Chroma	Alpha	`pip install 'vectorpin[chroma]'`
Pinecone	Alpha	`pip install 'vectorpin[pinecone]'`
Qdrant	Alpha	`pip install 'vectorpin[qdrant]'`
pgvector	Planned	—
FAISS	Planned	Use `LanceDBAdapter` (embedded, has metadata column natively).

LanceDB is the recommended default: embedded, file-based, no daemon, with a typed schema column that holds the Pin natively — matching the Symbiont runtime's default vector backend. Choose Chroma or Pinecone if you already run those; Qdrant if you need server-side payload filtering.

For Symbiont deployments, the source text the embedding was produced from lives in Symbiont's content column (Symbiont's column literally named source is upstream provenance like a URL, not VectorPin's source argument). Pass source=record.metadata["content"] when calling signer.pin. See tests/test_adapter_lancedb_symbiont.py for an end-to-end example against the Symbiont schema.

from vectorpin import Signer, Verifier
from vectorpin.adapters import LanceDBAdapter

adapter = LanceDBAdapter.connect("./data/vector_db", "rag-corpus")
signer = Signer.generate(key_id="prod-2026-05")
verifier = Verifier(public_keys={signer.key_id: signer.public_key_bytes()})

# Replace "text" below with whichever column on your table holds
# the source text the embedding was produced from. On Symbiont's
# default schema, that column is named "content".
for record in adapter.iter_records():
    pin = signer.pin(
        source=record.metadata["text"],
        model="text-embedding-3-large",
        vector=record.vector,
    )
    adapter.attach_pin(record.id, pin)

The adapter protocol is intentionally thin; community contributions for new backends are welcome.

Performance

Pinning and verification are sub-millisecond per vector on commodity hardware — well below the embedding-model latency they sit alongside. Microbenchmarks for both implementations live at rust/vectorpin/benches/perf.rs (criterion) and scripts/bench_python.py (time.perf_counter_ns).

# Rust (criterion writes a report to target/criterion/)
cd rust && cargo bench --bench perf

# Python (standalone, no extra deps)
python scripts/bench_python.py --iters 5000

Indicative numbers on a modern x86_64 laptop, 3072-dim vectors (matching text-embedding-3-large):

Operation	Rust (µs)	Python (µs)
`hash_vector`	6.4	5.8
`sign` (pin)	35	35
`verify_full`	42	79
`verify_signature_only`	22	75

Re-run on your own hardware before quoting numbers.

Statistical detectors

Pinning catches modifications. Detectors catch ingestion-time tampering and poisoning campaigns that inject new tampered vectors. The two are complementary defenses:

from vectorpin.detectors.isolation_forest import IsolationForestDetector

detector = IsolationForestDetector().fit(clean_embeddings)
flagged = detector.decide(suspect_embeddings)

In the VectorSmuggle empirical study, this single line of defense flagged every operating point of every distribution-shifting steganographic technique that hides a non-trivial amount of data — but it does not catch orthogonal rotation (which preserves every density feature the detector fits on) and is brittle against attackers who know the detector. Cryptographic pinning is the durable layer; statistical detection is defense-in-depth.

Threat model

VectorPin is designed against an attacker who can:

Modify vectors after they are produced (via a poisoned ingestion pipeline, a compromised vector DB, or backup-level access)
See the public verification key, but not the private signing key
Replay or selectively delete pins

VectorPin does not defend against:

An attacker with the private signing key (out of scope; key custody is the user's responsibility)
An attacker who modifies the source documents before embedding (use upstream content integrity controls)
An attacker who uses a legitimate signing key to attest a malicious vector at ingestion time (use upstream input validation)

Status

Alpha (v0.1). Core protocol (Pin, Signer, Verifier) is stable and tested. Python and Rust ports are byte-for-byte compatible and locked together by shared test vectors in CI. Adapter coverage is partial. Hosted attestation service is not yet available.

The protocol version field (v: 1) lets future revisions break compatibility cleanly. We will not break existing pins without bumping the major version. See docs/spec.md for the wire-format specification.

Citation

If you reference VectorPin or the threat model it defends against, please cite the companion preprint:

Wanger, J. (2026). VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense. Zenodo. https://doi.org/10.5281/zenodo.20058256

@misc{wanger2026vectorsmuggle,
  title  = {{VectorSmuggle}: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense},
  author = {Wanger, Jascha},
  year   = {2026},
  publisher = {Zenodo},
  doi    = {10.5281/zenodo.20058256},
  url    = {https://doi.org/10.5281/zenodo.20058256}
}

Related work

VectorSmuggle — companion threat-research project demonstrating the attacks VectorPin defends against. Empirical results in the linked Zenodo preprint.
Symbiont — policy-governed agent runtime; consumes VectorPin attestations to enforce "agents may only retrieve from verified vector stores."
SchemaPin — sister project doing the same kind of cryptographic provenance for tool schemas in MCP.
sigstore — inspired our approach to OSS-friendly cryptographic provenance.

Contributing

Issues and PRs welcome. For security-sensitive findings, please email security@thirdkey.ai rather than filing public issues.

License

Apache 2.0. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

May 7, 2026

This version

0.1.0

May 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectorpin-0.1.0.tar.gz (38.4 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vectorpin-0.1.0-py3-none-any.whl (34.1 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file vectorpin-0.1.0.tar.gz.

File metadata

Download URL: vectorpin-0.1.0.tar.gz
Upload date: May 7, 2026
Size: 38.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for vectorpin-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`abb975970cb4e713d31516e31589c83c40bfd82fa0489518631cba9349f98441`
MD5	`79d63eea705d22647f4806153beb7c2b`
BLAKE2b-256	`b43c762279dee771c24062cc8b8c8381c83bf7b6e2913d5b6b2155e3a6e37c60`

See more details on using hashes here.

File details

Details for the file vectorpin-0.1.0-py3-none-any.whl.

File metadata

Download URL: vectorpin-0.1.0-py3-none-any.whl
Upload date: May 7, 2026
Size: 34.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for vectorpin-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1facb6ed1b6f73495a19d931f6b2da9fe50cd1aae91d271583a9b71abf0ff761`
MD5	`ac9bf6f2794bd629824a1ccfc5468ce8`
BLAKE2b-256	`de05921a2dab3ea22ab2d278de2d3d8f93d3dd64a24f847497b5b672d33dce10`

See more details on using hashes here.

vectorpin 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VectorPin

Why this matters

Quick start

Python

Rust

TypeScript / JavaScript

What VectorPin guarantees

CLI

Vector store integrations

Performance

Statistical detectors

Threat model

Status

Citation

Related work

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes