venomx

These details have not been verified by PyPI

Project description

venomx - Vector Embedding Named Object Model indeX

VENOMX is an exchange standard for vector embeddings, layered on existing vector storage standards, with the goal of sharing vector embeddings for named entities (books, genes, publications, anything in schema.org, ...).

One of the goals is to support an Embedding Hub, potentially layered on existing dataset repositories (e.g. huggingface, zenodo, figshare, ...).

Why?

Why have a standard for embeddings? We have CSVs and Parquet and arrow, what else do we need?

Let's say Barb has made vector embeddings of all of Wikidata using OpenAI text-embedding-ada-002, and distributes them (e.g. on HuggingFace), and Alice downloads from here.

Two months later, Alice wants to use this for RAG querying. But she has forgotten what model was used, and also what version of Wikidata was used. She also doesn't remember what exactly was indexed. Was it the page titles? The descriptions? Or the full Wikipedia text?

Multiply this by multiple datasets, versions, indexing strategies, subsets, and chaos ensues.

The goal of venomx is to provide:

a simple YAML format for metadata about an embeddings set
a super-simple parquet schema for the embeddings themselves

It is intended to compose with existing standards for model cards and dataset descriptions, rather than replace.

It is also intended to be flexible, and support the following scenarios:

distribute embedding metadata alongside embeddings, with the latter in an efficient format like Parquet
distribute both together in YAML (convenience at the expense of some access-time/space efficiency)
Use of other serializations than YAML for the metadata (JSON, TSV, RDF, Avro, GraphQL, ...)
Use of alternate storage formats for the embeddings (Arrow, HDF5, ...)
Easy composition with your favorite array library (numpy, xarr, pyarrow, ...)
Easy composition with your favorite vector database (ChromaDB, etc)
Use in combination with objects stored in databases like Solr, PostgreSQL, ...

Note that current functionality is highly minimal, but in future there may be plugins e.g. for import/export from vectordbs.

Things that are out of scope including actually creating the embedding and computing over them. There already existing many existing great frameworks for this. Venomx is focused purely on making indexes of embeddings FAIR and easy to share.

Example

This example is based around the Human Phenotype Ontology (HPO). Vector embeddings of HPO are useful for searching for phenotypes, for RAG-type LLM applications, and for applications such as variant prioritization (cosine similarity of vector embeddings could replace traditional ontological semantic similarity measures).

The default way to distribute using venomx is a YAML file with metadata, and a Parquet file:

$ ls
hp.yaml
hp.parquet

The contents of hp.yaml:

description: HPO label index
prefixes:
  HP: http://purl.obolibrary.org/obo/HP_
model:
  name: "text-embedding-ada-002"
model_input_method:
  description: Simple pass through of labels only
  fields: [ "rdfs:label" ]
dataset:
  name: HPO-Jan-2024
  url: http://purl.obolibrary.org/obo/hp/releases/2024-01-01/hp.owl
objects:
  - id: "HP:0000001"
    label: "All"
  - id: "HP:0000002"
    label: "Abnormality of body height"
  # <snip>

Running parquet schema hp.parquet gives the schema for hp.parquet:

{
  "type" : "record",
  "name" : "schema",
  "fields" : [ {
    "name" : "id",
    "type" : [ "null", "int" ],
    "default" : null
  }, {
    "name" : "name",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "embedding",
    "type" : [ "null", {
      "type" : "array",
      "items" : {
        "type" : "record",
        "name" : "list",
        "fields" : [ {
          "name" : "element",
          "type" : [ "null", "float" ],
          "default" : null
        } ]
      }
    } ],
    "default" : null
  }, {
    "name" : "metadata",
    "type" : [ "null", {
      "type" : "map",
      "values" : [ "null", "string" ]
    } ],
    "default" : null
  } ]
}

Command line tools

Although the main purpose of this repo is as a proposed standard, we include some simple tools for basic conversion and validation

Conversion

Currently only two formats are supported:

parquet: two files, a metadata yaml file and a parquet file
yaml: a combined all-in-one yaml file (may be less efficient)

(THIS IS PROBABLY A BIT CONFUSING AND MAY CHANGE)

The test folder includes an all-in-one example, we can convert that to a dual yaml/parquet format:

venomx convert -f yaml tests/input/example.combined.yaml -t parquet -o tests/output/example.yaml

Validation

venomx validate tests/output/example.yaml

Roadmap

Use linkml-arrays standard
other embeddings formats (xarr, arrow, ...)
...

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.3

Mar 30, 2026

0.1.1

Jan 27, 2024

0.1.0

Jan 27, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

venomx-0.2.3.tar.gz (129.3 kB view details)

Uploaded Mar 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

venomx-0.2.3-py3-none-any.whl (11.4 kB view details)

Uploaded Mar 30, 2026 Python 3

File details

Details for the file venomx-0.2.3.tar.gz.

File metadata

Download URL: venomx-0.2.3.tar.gz
Upload date: Mar 30, 2026
Size: 129.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for venomx-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`a57297c45297f354fa3f6dd69327aca4ecfa005dedd1c7b9a12c4b1ecb469477`
MD5	`25318c5ab4510d7e7817f223f005e525`
BLAKE2b-256	`31a651606b45db4e187f77f8e3b99da12eeb33a4bdd5906d2543877dc17d57db`

See more details on using hashes here.

Provenance

The following attestation bundles were made for venomx-0.2.3.tar.gz:

Publisher: pypi-publish.yml on cmungall/venomx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: venomx-0.2.3.tar.gz
- Subject digest: a57297c45297f354fa3f6dd69327aca4ecfa005dedd1c7b9a12c4b1ecb469477
- Sigstore transparency entry: 1200258801
- Sigstore integration time: Mar 30, 2026
Source repository:
- Permalink: cmungall/venomx@9c355221f4b2bfecfeb36d2021a7491f9c6982ae
- Branch / Tag: refs/tags/v0.2.3
- Owner: https://github.com/cmungall
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@9c355221f4b2bfecfeb36d2021a7491f9c6982ae
- Trigger Event: release

File details

Details for the file venomx-0.2.3-py3-none-any.whl.

File metadata

Download URL: venomx-0.2.3-py3-none-any.whl
Upload date: Mar 30, 2026
Size: 11.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for venomx-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`537e37af65f486d468e7662a543f5e28393589d122e4109b7437073d72cc74ca`
MD5	`746551783b77ab7b2ba382a39006f566`
BLAKE2b-256	`77ab082b8bf10b988fe4ba58641c4610e4abaf813575ae4d8965202a5f70720a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for venomx-0.2.3-py3-none-any.whl:

Publisher: pypi-publish.yml on cmungall/venomx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: venomx-0.2.3-py3-none-any.whl
- Subject digest: 537e37af65f486d468e7662a543f5e28393589d122e4109b7437073d72cc74ca
- Sigstore transparency entry: 1200258810
- Sigstore integration time: Mar 30, 2026
Source repository:
- Permalink: cmungall/venomx@9c355221f4b2bfecfeb36d2021a7491f9c6982ae
- Branch / Tag: refs/tags/v0.2.3
- Owner: https://github.com/cmungall
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@9c355221f4b2bfecfeb36d2021a7491f9c6982ae
- Trigger Event: release

venomx 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

venomx - Vector Embedding Named Object Model indeX

Why?

Example

Command line tools

Conversion

Validation

Roadmap

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance