CapyDB Python Client

These details have not been verified by PyPI

Project links

Project description

CapyDB Python SDK

The official Python library for CapyDB - the chillest AI-native database.
Store documents, vectors, and more — all in one place, with no need for extra vector DBs.

Features
Installation
Quick Start
EmbJSON Data Types
- EmbText
- EmbImage
License
Contact

Features

NoSQL + Vector + Object Storage in one platform.
No External Embedding Steps — Just insert text with EmbText, CapyDB does the rest!
Built-in Semantic Search — Perform similarity-based queries without external services.
Production-Ready — Securely store your API key using environment variables.

Installation

pip install capydb

Note: For local development, you can store your key in a .env file or assign it to a variable directly. Avoid hardcoding credentials in production.

Quick Start

Sign Up and Get Credentials

Sign Up at CapyDB.
Retrieve your API Key and Project ID from the developer console.
Store these securely (e.g., in environment variables).

Initialize Client

import os
from capydb import CapyDB

# Load environment variables (for local development)
# In production, set these in your environment
os.environ["CAPY_API_KEY"] = "your-api-key"
os.environ["CAPY_PROJECT_ID"] = "your-project-id"

# Initialize the client
client = CapyDB()

# Access a database and collection
db = client.db("my_database")
collection = db.collection("my_collection")

# Alternative syntax using attribute access
collection = client.my_database.my_collection

Insert Documents (No Embedding Required!)

from capydb import CapyDB, EmbText

# Initialize the client
client = CapyDB()
collection = client.my_database.my_collection

# Define a document with an EmbText field
document = {
    "name": "Alice",
    "age": 7,
    "background": EmbText(
        "Through the Looking-Glass follows Alice as she steps into a fantastical world..."
    )
}

# Insert the document
result = collection.insert_one(document)
print(f"Inserted document with ID: {result.inserted_id}")

What Happens Under the Hood?

Text fields wrapped as EmbText are automatically chunked and embedded.
The resulting vectors are indexed for semantic queries.
All processing happens asynchronously in the background.

Query Documents (Semantic Search)

from capydb import CapyDB

# Initialize the client
client = CapyDB()
collection = client.my_database.my_collection

# Simple text query
user_query = "What is the capital of France?"
filter_dict = {"category": "geography"} # Optional
projection = {"mode": "include", "fields": ["title", "content"]} # Optional

# Perform semantic search
response = collection.query(user_query, filter_dict, projection)
print("Query matches:", response.matches)

# Access the first match
if response.matches:
    match = response.matches[0]
    print(f"Matched chunk: {match.chunk}")
    print(f"Field path: {match.path}")
    print(f"Similarity score: {match.score}")
    print(f"Document ID: {match.document._id}")

Example Response:

{
  "matches": [
    {
      "chunk": "Through the Looking-Glass follows Alice...",
      "path": "background",
      "score": 0.703643203,
      "document": {
        "_id": ObjectId("671bf91580bffb6387b4f3d2")
      }
    }
  ]
}

EmbJSON Data Types

CapyDB extends JSON with AI-friendly data types like EmbText, making text embeddings and indexing automatic.
No need for a separate vector DB or embedding service — CapyDB handles chunking, embedding, and indexing asynchronously.

EmbText

EmbText is a specialized data type for storing and embedding text in CapyDB. It enables semantic search capabilities by automatically chunking, embedding, and indexing text.

When stored in the database, the text is processed asynchronously in the background:

The text is chunked based on the specified parameters
Each chunk is embedded using the specified embedding model
The embeddings are indexed for efficient semantic search

Basic Usage

Below is the simplest way to use EmbText:

from capydb import EmbText

# Storing a single text field that you want to embed
document = {
  "field_name": EmbText("Alice is a data scientist with expertise in AI and machine learning. She has led several projects in natural language processing.")
}

This snippet creates an EmbText object containing the text. By default, it uses the text-embedding-3-small model and sensible defaults for chunking and overlap.

Customized Usage

If you have specific requirements (e.g., a different embedding model or particular chunking strategy), customize EmbText by specifying additional parameters:

from capydb import EmbText, EmbModels

document = {
    "field_name": EmbText(
        text="Alice is a data scientist with expertise in AI and machine learning. She has led several projects in natural language processing.",
        emb_model=EmbModels.TEXT_EMBEDDING_3_LARGE,  # Change the default model
        max_chunk_size=200,                          # Configure chunk sizes
        chunk_overlap=20,                            # Overlap between chunks
        is_separator_regex=False,                    # Are separators plain strings or regex?
        separators=[
            "\n\n",
            "\n",
        ],
        keep_separator=False,                        # Keep or remove the separator in chunks
    )
}

Parameter Reference

Parameter	Description
text	The core content for `EmbText`. This text is automatically chunked and embedded for semantic search.
emb_model	Which embedding model to use. Defaults to `text-embedding-3-small`. You can choose from other supported models, such as `text-embedding-3-large`.
max_chunk_size	Maximum character length of each chunk. Larger chunks reduce the total chunk count but may reduce search efficiency (due to bigger embeddings).
chunk_overlap	Overlapping character count between consecutive chunks, useful for preserving context at chunk boundaries.
is_separator_regex	Whether to treat each separator in `separators` as a regular expression. Defaults to `False`.
separators	A list of separator strings (or regex patterns) used to split the text. For instance, `["\n\n", "\n"]` can split paragraphs or single lines.
keep_separator	If `True`, separators remain in the chunked text. If `False`, they are stripped out.
chunks	Auto-generated by the database after the text is processed. It is not set by the user, and is available only after embedding completes.

How It Works

Whenever you insert a document containing EmbText into CapyDB, three main steps happen asynchronously:

Chunking
The text is divided into chunks based on max_chunk_size, chunk_overlap, and any specified separators. This ensures the text is broken down into optimally sized segments.
Embedding
Each chunk is transformed into a vector representation using the specified emb_model. This step captures the semantic essence of the text.
Indexing
The embeddings are indexed for efficient semantic search. Because these steps occur in the background, you get immediate responses to your write operations, but actual query availability may lag slightly behind the write.

Accessing Generated Chunks

The chunks attribute is auto-added by the database after the text finishes embedding and indexing. For instance:

# Assume this EmbText has been inserted and processed
emb_text = document["field_name"]  

print(emb_text.text)
# "Alice is a data scientist with expertise in AI and machine learning. She has led several projects in natural language processing."

print(emb_text.chunks)
# [
#   "Alice is a data scientist",
#   "with expertise in AI",
#   "and machine learning.",
#   "She has led several projects",
#   "in natural language processing."
# ]

Usage in Nested Fields

EmbText can be embedded anywhere in your document, including nested objects:

document = {
  "profile": {
    "name": "Bob",
    "bio": EmbText(
      "Bob has over a decade of experience in AI, focusing on neural networks and deep learning."
    )
  }
}

EmbImage

EmbImage is a specialized data type for storing and processing images in CapyDB. It enables multimodal capabilities by storing images that can be:

Processed by vision models to extract textual descriptions
Embedded for vector search (using the extracted descriptions)
Stored alongside other document data

When stored in the database, the image is processed asynchronously in the background:

If a vision model is specified, the image is analyzed to generate textual descriptions
If an embedding model is specified, these descriptions are embedded for semantic search
The results are stored in the 'chunks' property

Basic Usage

Below is the simplest way to use EmbImage:

from capydb import EmbImage
import base64

# Read an image file and convert to base64
with open("path/to/image.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

# Storing a single image field
document = {
    "title": "Product Image",
    "image": EmbImage(image_data)
}

This snippet creates an EmbImage object containing your base64-encoded image data. By default, no specific models are set and all other parameters remain optional.

Customized Usage

If you have specific requirements (e.g., using a particular embedding or vision model), customize EmbImage by specifying additional parameters:

from capydb import EmbImage, EmbModels, VisionModels
import base64

# Read an image file and convert to base64
with open("path/to/image.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

document = {
    "title": "Product Image",
    "description": "Our latest product",
    "image": EmbImage(
        data=image_data,                                  # Base64-encoded image
        vision_model=VisionModels.GPT_4O,                 # Vision model for analysis
        emb_model=EmbModels.TEXT_EMBEDDING_3_SMALL,       # For embedding descriptions
        max_chunk_size=200,                               # Configure chunk sizes
        chunk_overlap=20,                                 # Overlap between chunks
        is_separator_regex=False,                         # Are separators plain strings or regex?
        separators=["\n\n", "\n"],                        # Separators for chunking
        keep_separator=False                              # Keep or remove separators
    )
}

Parameter Reference

Parameter	Description
data	The base64 encoded image data. This image is processed and embedded for semantic search.
vision_model	Which vision model to use for processing the image. Defaults to `None`. Supported models include `GPT_4O_MINI`, `GPT_4O`, `GPT_4O_TURBO`, and `GPT_O1`.
emb_model	Which embedding model to use for text chunks. Defaults to `None`. Supported models include `text-embedding-3-small`, `text-embedding-3-large`, and `text-embedding-ada-002`.
max_chunk_size	Maximum character length for each text chunk. Used when processing vision model output.
chunk_overlap	Overlapping character count between consecutive chunks, useful for preserving context at chunk boundaries.
is_separator_regex	Whether to treat each separator in `separators` as a regular expression. Defaults to `False`.
separators	A list of separator strings (or regex patterns) used during processing. While more common in text, these may also apply to image metadata or descriptions if present.
keep_separator	If `True`, separators remain in the processed data. If `False`, they are removed.
chunks	Auto-generated by the database after processing the image. It is not set by the user, and is available only after embedding completes.

How It Works

Whenever you insert a document containing EmbImage into CapyDB, the following steps occur asynchronously:

Data Validation and Decoding
The base64 image data is validated (ensuring it's properly encoded) and decoded as needed.
Vision Model Processing (if specified)
If a vision model is specified, the image is analyzed to generate textual descriptions.
Embedding (if specified)
If an embedding model is specified, the textual descriptions are transformed into vector representations.
Indexing
The resulting embeddings are indexed for efficient semantic search. These steps happen in the background, so while write operations are fast, query availability may have a slight delay.

Querying Images

Once the embedding and indexing steps are complete, your EmbImage fields become searchable. To perform semantic queries on image data, use the standard query operations:

from capydb import CapyDB

# Initialize the client
client = CapyDB()
collection = client.my_database.my_collection

# Query for images with similar content
results = collection.query("product with blue background")

# Access the first match
if results.matches:
    match = results.matches[0]
    print(f"Matched chunk: {match.chunk}")
    print(f"Field path: {match.path}")
    print(f"Similarity score: {match.score}")
    print(f"Document ID: {match.document._id}")

License

Contact

Questions? Email us
Website: capydb.com

Happy coding with CapyDB!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.0

May 3, 2025

0.6.2

Apr 26, 2025

0.6.1

Apr 19, 2025

0.6.0

Apr 19, 2025

0.5.0

Apr 16, 2025

0.3.4

Apr 15, 2025

0.3.3

Apr 15, 2025

This version

0.3.2

Apr 12, 2025

0.3.1

Apr 9, 2025

0.3.0

Mar 30, 2025

0.2.8

Mar 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

capydb-0.3.2.tar.gz (15.7 kB view details)

Uploaded Apr 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

capydb-0.3.2-py3-none-any.whl (15.1 kB view details)

Uploaded Apr 12, 2025 Python 3

File details

Details for the file capydb-0.3.2.tar.gz.

File metadata

Download URL: capydb-0.3.2.tar.gz
Upload date: Apr 12, 2025
Size: 15.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.1 CPython/3.12.9 Darwin/24.4.0

File hashes

Hashes for capydb-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`11044aff2b46910aa65847537c7a3fcce63a5910af4888b4b865d79a1b7abbd2`
MD5	`f257bc86e52cefdf91fdeca48ca766f7`
BLAKE2b-256	`462c8c5fdf0c977c55ab7998fd13939fc1a1fd885e455b1882fd027749e1e5f9`

See more details on using hashes here.

File details

Details for the file capydb-0.3.2-py3-none-any.whl.

File metadata

Download URL: capydb-0.3.2-py3-none-any.whl
Upload date: Apr 12, 2025
Size: 15.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.1 CPython/3.12.9 Darwin/24.4.0

File hashes

Hashes for capydb-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8c87e34c761dc063f5c5929f449b3ce9a8f359f713bc4a6b09f4c3d028fc7e37`
MD5	`6608ffd2e30845bb181dd23f4b6875e4`
BLAKE2b-256	`6761f435c88d5c3b4c7672264cd87cbb1abfdade91ce1ed280dab4feb2b83d25`

See more details on using hashes here.

capydb 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CapyDB Python SDK

Table of Contents

Features

Installation

Quick Start

Sign Up and Get Credentials

Initialize Client

Insert Documents (No Embedding Required!)

Query Documents (Semantic Search)

EmbJSON Data Types

EmbText

Basic Usage

Customized Usage

Parameter Reference

How It Works

Accessing Generated Chunks

Usage in Nested Fields

EmbImage

Basic Usage

Customized Usage

Parameter Reference

How It Works

Querying Images

License

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes