Skip to main content

Python implementation of the VXDF (Vector Exchange Data Format) for storing text, metadata and vector embeddings in a single portable file.

Project description

VXDF Python Library

PyPI CI

VXDF (Vector eXchange Data Format) is an AI-native container for text, metadata and vector embeddings—portable, indexable and compressed. If you do RAG, semantic search or compliance audits, VXDF gives you one file, one command.

Quick-start

pip install vxdf[zstd]        # installs optional Zstandard support

python - << 'PY'
from vxdf import VXDFWriter, VXDFReader

# create a small file
data = [
    {"id": "1", "text": "hello", "vector": [0.1, 0.2]},
    {"id": "2", "text": "world", "vector": [0.3, 0.4]},
]
with VXDFWriter("demo.vxdf", embedding_dim=2, compression="zstd") as w:
    for chunk in data:
        w.add_chunk(chunk)

# read it back
a = VXDFReader("demo.vxdf")
print(a.get_chunk("2"))
PY

Command-line

vxdf pack data.jsonl data.vxdf --compression zstd   # create
vxdf info data.vxdf                                  # header & stats
vxdf list data.vxdf | head                           # ids
vxdf get  data.vxdf some-id > doc.json               # extract

# Pipe stdin to stdout (auto-detects model, disables banner/progress)
cat report.txt | vxdf convert - - > report.vxdf

Colab / Notebook

Open In Colab

LangChain integration (preview)

from langchain_community.vectorstores import VXDF
vs = VXDF.from_vxdf("demo.vxdf")

See examples/langchain_integration.py for a minimal adapter.

Authentication

VXDF commands that interact with cloud services need credentials.

OpenAI embeddings

The client looks for an API key in this order (first match wins):

  1. --openai-key CLI flag (e.g. vxdf convert my.pdf out.vxdf --model openai --openai-key sk-...)
  2. OPENAI_API_KEY environment variable.
  3. ~/.vxdf/config.toml under the [openai] table:
[openai]
api_key = "sk-..."

AWS (S3 URLs)

Uses the standard AWS credential chain provided by boto3 – environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY), the AWS CLI config, or an attached IAM role. Run aws configure if unsure.

GCP (gs:// URLs)

Relies on Application Default Credentials. Run gcloud auth application-default login or set the GOOGLE_APPLICATION_CREDENTIALS environment variable pointing at a JSON key file.

If credentials are missing VXDF exits early with a clear message and a hint on how to configure them.

Shell completion

Install extra dependencies and activate once:

pip install vxdf[completion]
activate-global-python-argcomplete --user  # bash/zsh/fish supported

Re-open your terminal and enjoy TAB-completion for vxdf sub-commands and options.


VXDF is BSD-3-licensed. Contributions welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vxdf-0.1.3.tar.gz (35.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vxdf-0.1.3-py3-none-any.whl (33.1 kB view details)

Uploaded Python 3

File details

Details for the file vxdf-0.1.3.tar.gz.

File metadata

  • Download URL: vxdf-0.1.3.tar.gz
  • Upload date:
  • Size: 35.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for vxdf-0.1.3.tar.gz
Algorithm Hash digest
SHA256 4128e1af704bf5cce6288c4fe595aea71878fe6e1d04f90c7f64b096056e9e5c
MD5 5e7c2c58b5efcb36aca44e9c05a30b71
BLAKE2b-256 207978c8fa8b0536c273c6dd5436e699eecab1aaf3abc8efcda3c6f81dd0b015

See more details on using hashes here.

File details

Details for the file vxdf-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: vxdf-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 33.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for vxdf-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 155c8f764acb7947383556ec568d4e39adb0f3a2df14da151dbbb59f602b26b0
MD5 18cdfcfcdfb3249378003146b19173f4
BLAKE2b-256 a8be46c060da70f0219f64b76f40c8e15ade1251aadddeb4b94c46929c302760

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page