Skip to main content

A collection of experimental Chroma extensions.

Project description

ChromaX: An experimental utilities package for Chroma vector database

Installation

pip install chromadbx

Features

  • Query Builder - build queries using a builder pattern
  • ID generation - generate IDs for documents
  • Embeddings - generate embeddings for your documents:
    • OnnxRuntime embeddings
    • Llama.cpp embeddings
    • Google Vertex AI embeddings
    • Mistral AI embeddings
    • Cloudflare Workers AI embeddings

Usage

Queries

Supported filters:

  • $eq - equal to (string, int, float)
  • $ne - not equal to (string, int, float)
  • $gt - greater than (int, float)
  • $gte - greater than or equal to (int, float)
  • $lt - less than (int, float)
  • $lte - less than or equal to (int, float)
  • $in - in (list of strings, ints, floats,bools)
  • $nin - not in (list of strings, ints, floats,bools)

Where:

import chromadb

from chromadbx.core.queries import eq, where, ne, and_

client = chromadb.PersistentClient(path="path/to/db")
collection = client.get_collection("collection_name")
collection.query(where=where(and_(eq("a", 1), ne("b", "2"))))
# {'$and': [{'a': ['$eq', 1]}, {'b': ['$ne', '2']}]}

Where Document:

import chromadb

from chromadbx.core.queries import where_document, contains, not_contains, LogicalOperator

client = chromadb.PersistentClient(path="path/to/db")
collection = client.get_collection("collection_name")
collection.query(where_document=where_document(contains("this is a document", "this is another document")))
# {'$and': [{'$contains': 'this is a document'}, {'$contains': 'this is another document'}]}
collection.query(
    where_document=where_document(contains("this is a document", "this is another document", op=LogicalOperator.OR)))
# {'$or': [{'$contains': 'this is a document'}, {'$contains': 'this is another document'}]}

ID Generation

import chromadb
from chromadbx import IDGenerator
from functools import partial
from typing import Generator

def sequential_generator(start: int = 0) -> Generator[str, None, None]:
        _next = start
        while True:
            yield f"{_next}"
            _next += 1
client = chromadb.Client()
col = client.get_or_create_collection("test")
my_docs = [f"Document {_}" for _ in range(10)]
idgen = IDGenerator(len(my_docs), generator=partial(sequential_generator, start=10))
col.add(ids=idgen, documents=my_docs)

UUIDs (default)

import chromadb
from chromadbx import UUIDGenerator

client = chromadb.Client()
col = client.get_or_create_collection("test")
my_docs = [f"Document {_}" for _ in range(10)]
col.add(ids=UUIDGenerator(len(my_docs)), documents=my_docs)

ULIDs

import chromadb
from chromadbx import ULIDGenerator
client = chromadb.Client()
col = client.get_or_create_collection("test")
my_docs = [f"Document {_}" for _ in range(10)]
col.add(ids=ULIDGenerator(len(my_docs)), documents=my_docs)

Hashes

Random SHA256:

import chromadb
from chromadbx import RandomSHA256Generator
client = chromadb.Client()
col = client.get_or_create_collection("test")
my_docs = [f"Document {_}" for _ in range(10)]
col.add(ids=RandomSHA256Generator(len(my_docs)), documents=my_docs)

Document-based SHA256:

import chromadb
from chromadbx import DocumentSHA256Generator
client = chromadb.Client()
col = client.get_or_create_collection("test")
my_docs = [f"Document {_}" for _ in range(10)]
col.add(ids=DocumentSHA256Generator(documents=my_docs), documents=my_docs)

NanoID

import chromadb
from chromadbx import NanoIDGenerator
client = chromadb.Client()
col = client.get_or_create_collection("test")
my_docs = [f"Document {_}" for _ in range(10)]
col.add(ids=NanoIDGenerator(len(my_docs)), documents=my_docs)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chromadbx-0.0.6.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

chromadbx-0.0.6-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file chromadbx-0.0.6.tar.gz.

File metadata

  • Download URL: chromadbx-0.0.6.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.9.20 Linux/6.5.0-1025-azure

File hashes

Hashes for chromadbx-0.0.6.tar.gz
Algorithm Hash digest
SHA256 253d2f2c0b51775895052b2d776776b81b1e015b436ce6acf03d07a1d183cc9a
MD5 b9e7d82e70a321e531a1db79467b5568
BLAKE2b-256 9618b76fc2a74b56a119cff244bb0444361a48602d2ec8fcc832c82b25d6d782

See more details on using hashes here.

File details

Details for the file chromadbx-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: chromadbx-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.9.20 Linux/6.5.0-1025-azure

File hashes

Hashes for chromadbx-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 1f8dc85dde4ec2bd4cbfae438061abb6b542f3583cb9d2c4c2e49e036d4ff743
MD5 e8a10bebab138123c961479727caf126
BLAKE2b-256 14e9dafe106fce336883b7c961a6670a744f4cabc70a31ab85b2fc05593955df

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page