Skip to main content

High-performance embedded graph database with vector search

Project description

RayDB for Python

RayDB is a high-performance embedded graph database with built-in vector search. This package provides the Python bindings to the Rust core.

Features

  • ACID transactions with commit/rollback
  • Node and edge CRUD operations with properties
  • Labels, edge types, and property keys
  • Fluent traversal and pathfinding (BFS, Dijkstra, A*)
  • Vector embeddings with IVF and IVF-PQ indexes
  • Single-file storage format

Install

From PyPI

pip install raydb

From source

# Install maturin (Rust extension build tool)
python -m pip install -U maturin

# Build and install in development mode
cd ray-rs
maturin develop --features python

# Or build a wheel
maturin build --features python --release
pip install target/wheels/raydb-*.whl

Quick start (fluent API)

The fluent API provides a high-level, type-safe interface:

from raydb import ray, node, edge, prop, optional

# Define your schema
User = node("user",
    key=lambda id: f"user:{id}",
    props={
        "name": prop.string("name"),
        "email": prop.string("email"),
        "age": optional(prop.int("age")),
    }
)

Knows = edge("knows", {
    "since": prop.int("since"),
})

# Open database
with ray("./social.raydb", nodes=[User], edges=[Knows]) as db:
    # Insert nodes
    alice = db.insert(User).values(key="alice", name="Alice", email="alice@example.com").returning()
    bob = db.insert(User).values(key="bob", name="Bob", email="bob@example.com").returning()

    # Create edges
    db.link(alice, Knows, bob, since=2024)

    # Traverse
    friends = db.from_(alice).out(Knows).nodes().to_list()

    # Pathfinding
    path = db.shortest_path(alice).via(Knows).to(bob).dijkstra()

Quick start (low-level API)

For direct control, use the low-level Database class:

from raydb import Database, PropValue

with Database("my_graph.raydb") as db:
    db.begin()

    alice = db.create_node("user:alice")
    bob = db.create_node("user:bob")

    name_key = db.get_or_create_propkey("name")
    db.set_node_prop(alice, name_key, PropValue.string("Alice"))
    db.set_node_prop(bob, name_key, PropValue.string("Bob"))

    knows = db.get_or_create_etype("knows")
    db.add_edge(alice, knows, bob)

    db.commit()

    print("nodes:", db.count_nodes())
    print("edges:", db.count_edges())

Fluent traversal

from raydb import TraverseOptions

friends = db.from_(alice).out(knows).to_list()

results = db.from_(alice).traverse(
    knows,
    TraverseOptions(max_depth=3, min_depth=1, direction="out", unique=True),
).to_list()

Concurrent Access

RayDB supports concurrent read operations from multiple threads. Read operations don't block each other:

import threading
from concurrent.futures import ThreadPoolExecutor

# Multiple threads can read concurrently
def read_user(key):
    return db.get_node_by_key(key)

with ThreadPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(read_user, f"user:{i}") for i in range(100)]
    results = [f.result() for f in futures]

# Or with asyncio (reads run concurrently)
import asyncio

async def read_users():
    loop = asyncio.get_event_loop()
    tasks = [
        loop.run_in_executor(None, db.get_node_by_key, f"user:{i}")
        for i in range(100)
    ]
    return await asyncio.gather(*tasks)

Concurrency model:

  • Reads are concurrent: Multiple get_node_by_key(), get_neighbors(), traversals, etc. can run in parallel
  • Writes are exclusive: Write operations (create_node(), add_edge(), etc.) require exclusive access
  • Thread safety: The Database object is safe to share across threads

Note: Python's GIL is released during Rust operations, allowing true parallelism for I/O-bound database access.

Vector search

from raydb import IvfIndex, IvfConfig, SearchOptions

index = IvfIndex(dimensions=128, config=IvfConfig(n_clusters=100))

training_data = [0.1] * (128 * 1000)
index.add_training_vectors(training_data, num_vectors=1000)
index.train()

index.insert(vector_id=1, vector=[0.1] * 128)

results = index.search(
    manifest_json='{"vectors": {...}}',
    query=[0.1] * 128,
    k=10,
    options=SearchOptions(n_probe=20),
)

for result in results:
    print(result.node_id, result.distance)

Documentation

https://ray-kwaf.vercel.app/docs

License

MIT License - see the main project LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

raydb-0.2.2.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

raydb-0.2.2-cp312-cp312-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file raydb-0.2.2.tar.gz.

File metadata

  • Download URL: raydb-0.2.2.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.6

File hashes

Hashes for raydb-0.2.2.tar.gz
Algorithm Hash digest
SHA256 52bcf8c984b9fa1e6a60ed8dead127152c848bb947b6ca1c0a26da3345de435c
MD5 945a3852d58e9e151d4d70a02f24d5e8
BLAKE2b-256 edb85497a93e777aacf854bfdb1f0c18834215c301779d9473b9c11aaab1910d

See more details on using hashes here.

File details

Details for the file raydb-0.2.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for raydb-0.2.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 17cdf5de982ee77298f48dd96ee6701d47c4cfc05f1cbfc78a4ca6b6a1c1ed4c
MD5 c6b3cb45b70279175aa3279de189084f
BLAKE2b-256 860d7193c718fb8bf6c067995c659cd62e02cab61ae0a306abe7cd8256caee3f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page