Skip to main content

Python implementation of the VXDF (Vector Exchange Data Format) for storing text, metadata and vector embeddings in a single portable file.

Project description

VXDF Python Library

PyPI CI

VXDF (Vector eXchange Data Format) is an AI-native container for text, metadata and vector embeddings—portable, indexable and compressed. If you do RAG, semantic search or compliance audits, VXDF gives you one file, one command.

Quick-start

pip install vxdf[zstd]        # installs optional Zstandard support

python - << 'PY'
from vxdf import VXDFWriter, VXDFReader

# create a small file
data = [
    {"id": "1", "text": "hello", "vector": [0.1, 0.2]},
    {"id": "2", "text": "world", "vector": [0.3, 0.4]},
]
with VXDFWriter("demo.vxdf", embedding_dim=2, compression="zstd") as w:
    for chunk in data:
        w.add_chunk(chunk)

# read it back
a = VXDFReader("demo.vxdf")
print(a.get_chunk("2"))
PY

Command-line

vxdf pack data.jsonl data.vxdf --compression zstd   # create
vxdf info data.vxdf                                  # header & stats
vxdf list data.vxdf | head                           # ids
vxdf get  data.vxdf some-id > doc.json               # extract

Colab / Notebook

Open In Colab

LangChain integration (preview)

from langchain_community.vectorstores import VXDF
vs = VXDF.from_vxdf("demo.vxdf")

See examples/langchain_integration.py for a minimal adapter.

Authentication

VXDF commands that interact with cloud services need credentials.

OpenAI embeddings

The client looks for an API key in this order (first match wins):

  1. --openai-key CLI flag (e.g. vxdf convert my.pdf out.vxdf --model openai --openai-key sk-...)
  2. OPENAI_API_KEY environment variable.
  3. ~/.vxdf/config.toml under the [openai] table:
[openai]
api_key = "sk-..."

AWS (S3 URLs)

Uses the standard AWS credential chain provided by boto3 – environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY), the AWS CLI config, or an attached IAM role. Run aws configure if unsure.

GCP (gs:// URLs)

Relies on Application Default Credentials. Run gcloud auth application-default login or set the GOOGLE_APPLICATION_CREDENTIALS environment variable pointing at a JSON key file.

If credentials are missing VXDF exits early with a clear message and a hint on how to configure them.

Shell completion

Install extra dependencies and activate once:

pip install vxdf[completion]
activate-global-python-argcomplete --user  # bash/zsh/fish supported

Re-open your terminal and enjoy TAB-completion for vxdf sub-commands and options.


VXDF is BSD-3-licensed. Contributions welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vxdf-0.1.2.tar.gz (27.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vxdf-0.1.2-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file vxdf-0.1.2.tar.gz.

File metadata

  • Download URL: vxdf-0.1.2.tar.gz
  • Upload date:
  • Size: 27.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for vxdf-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e6cb45b18409bbb863dfb050e73e1d2e888802a704a05b35c2053f766dcccf60
MD5 643dae64e6fbef2939e925c70c5b2d11
BLAKE2b-256 5cd7bbfb916e760ef22f81add5ebd00e95dbdccee22a63db9c9722d2e62277fd

See more details on using hashes here.

File details

Details for the file vxdf-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: vxdf-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for vxdf-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6bc8ff60fccc499c41095a15292ed8fc7a82c7a1006dc756875f2616d05b0a2c
MD5 4c8c9c803fe8cbb96cd8d8bb87e3d7f0
BLAKE2b-256 41c09420b824dbc9b821a5d4cec705bd66a432f806e35d1941fcfc6c99320812

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page