Skip to main content

Python implementation of the VXDF (Vector Exchange Data Format) for storing text, metadata and vector embeddings in a single portable file.

Project description

VXDF Python Library

PyPI CI

VXDF (Vector eXchange Data Format) is an AI-native container for text, metadata and vector embeddings—portable, indexable and compressed. If you do RAG, semantic search or compliance audits, VXDF gives you one file, one command.

Quick-start

pip install vxdf[zstd]        # installs optional Zstandard support

python - << 'PY'
from vxdf import VXDFWriter, VXDFReader

# create a small file
data = [
    {"id": "1", "text": "hello", "vector": [0.1, 0.2]},
    {"id": "2", "text": "world", "vector": [0.3, 0.4]},
]
with VXDFWriter("demo.vxdf", embedding_dim=2, compression="zstd") as w:
    for chunk in data:
        w.add_chunk(chunk)

# read it back
a = VXDFReader("demo.vxdf")
print(a.get_chunk("2"))
PY

Command-line

vxdf pack data.jsonl data.vxdf --compression zstd   # create
vxdf info data.vxdf                                  # header & stats
vxdf list data.vxdf | head                           # ids
vxdf get  data.vxdf some-id > doc.json               # extract

Colab / Notebook

Open In Colab

LangChain integration (preview)

from langchain_community.vectorstores import VXDF
vs = VXDF.from_vxdf("demo.vxdf")

See examples/langchain_integration.py for a minimal adapter.

Authentication

VXDF commands that interact with cloud services need credentials.

OpenAI embeddings

The client looks for an API key in this order (first match wins):

  1. --openai-key CLI flag (e.g. vxdf convert my.pdf out.vxdf --model openai --openai-key sk-...)
  2. OPENAI_API_KEY environment variable.
  3. ~/.vxdf/config.toml under the [openai] table:
[openai]
api_key = "sk-..."

AWS (S3 URLs)

Uses the standard AWS credential chain provided by boto3 – environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY), the AWS CLI config, or an attached IAM role. Run aws configure if unsure.

GCP (gs:// URLs)

Relies on Application Default Credentials. Run gcloud auth application-default login or set the GOOGLE_APPLICATION_CREDENTIALS environment variable pointing at a JSON key file.

If credentials are missing VXDF exits early with a clear message and a hint on how to configure them.

Shell completion

Install extra dependencies and activate once:

pip install vxdf[completion]
activate-global-python-argcomplete --user  # bash/zsh/fish supported

Re-open your terminal and enjoy TAB-completion for vxdf sub-commands and options.


VXDF is BSD-3-licensed. Contributions welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vxdf-0.1.1.tar.gz (27.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vxdf-0.1.1-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file vxdf-0.1.1.tar.gz.

File metadata

  • Download URL: vxdf-0.1.1.tar.gz
  • Upload date:
  • Size: 27.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for vxdf-0.1.1.tar.gz
Algorithm Hash digest
SHA256 251bc60692ff916601fe9cdccdbabe4d816d3d323f6511fe98c157c08a96a694
MD5 375f5a65fe80a1b2b3008350a41bb855
BLAKE2b-256 54608f00aca979dec8369f6d973ca20dd5f9fb266f7ecc49291f59caf4b6fd3f

See more details on using hashes here.

File details

Details for the file vxdf-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: vxdf-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 26.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for vxdf-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3107f38627006caeba167bcebbdc5698d187db64bf014d027ebd6e8736c73a02
MD5 95edf715b900b0767a9e56d540201c2c
BLAKE2b-256 068d1ba308de579aab32b85cfbe7b297b8a57f323c21135db96df8a6e806cb00

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page