Skip to main content

High-performance in-memory graph API with RoaringBitmaps, Python bindings via PyO3

Project description

RustyChickpeas

Coverage Status

RustyChickpeas Logo

An in-memory graph API written in Rust, using RoaringBitmaps as a fundamental data structure.

Overview

RustyChickpeas provides a high-performance, in-memory graph database API that:

  • Uses RoaringBitmaps: Efficient set operations for node/relationship IDs
  • Rust-First: High-performance core implementation
  • Python-Friendly: PyO3 bindings for seamless Python integration
  • In-Memory: Optimized for fast graph operations

Project Status

Implementation Complete - Core functionality implemented with comprehensive benchmarks and Python bindings.

  • ✅ Immutable GraphSnapshot with CSR adjacency and columnar properties
  • ✅ GraphSnapshotBuilder for efficient bulk loading (requires u32 node IDs)
  • ✅ Parallel finalization using rayon for improved performance
  • ✅ Python bindings via PyO3
  • ✅ Bulk loading from Parquet files

See rustychickpeas-core/benches/README.md for benchmark information.

Key Features

  • Immutable GraphSnapshot - Read-optimized graph with CSR adjacency and columnar properties
  • GraphSnapshotBuilder - Efficient bulk loading (requires u32 node IDs directly)
  • RustyChickpeas Manager - Version management for multiple graph snapshots
  • Parallel Finalization - Uses rayon to parallelize index building during finalization
  • ✅ Label and property support with inverted indexes
  • ✅ Efficient traversal using CSR (Compressed Sparse Row) format
  • ✅ Python bindings via PyO3
  • ✅ Bulk loading from Parquet files

Installation

pip install rustychickpeas

Or from source:

git clone https://github.com/freeeve/rustychickpeas.git
cd rustychickpeas/rustychickpeas-python
pip install maturin
maturin develop --release

Platform Support

RustyChickpeas is tested and supported on:

  • Linux x86_64 - Full support
  • macOS x86_64 (Intel) - Full support
  • macOS arm64 (Apple Silicon) - Full support
  • Windows x86_64 - Full support

Limitations:

  • ⚠️ Linux aarch64 (ARM servers, e.g., AWS Graviton) - Not currently tested in CI. The Rust core should compile and work, but Python bindings require native runners or complex cross-compilation setup. Contributions welcome!

Python Version Support

RustyChickpeas requires Python >= 3.10 and is tested against Python 3.10, 3.11, 3.12, 3.13, and 3.14.

Supported Python Versions by Platform

Platform Python Versions
Linux x86_64 3.10, 3.11, 3.12, 3.13, 3.14
macOS x86_64 (Intel) 3.11, 3.12, 3.13, 3.14
macOS arm64 (Apple Silicon) 3.10, 3.11, 3.12, 3.13, 3.14
Windows x86_64 3.10, 3.11, 3.12, 3.13, 3.14

Note: Python 3.10 is not available on macOS x86_64 runners due to dependency issues with the GitHub Actions environment. Python 3.10 works fine on macOS arm64 (native runners) and all other platforms.

PyO3 Compatibility

  • Python 3.10-3.12: Fully supported by PyO3 0.20.3

  • Python 3.13-3.14: Uses stable ABI compatibility mode (PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1) since PyO3 0.20.3 doesn't officially support these versions yet. This works correctly but uses the stable ABI for forward compatibility.

    Note for local development: When using cargo check or cargo build with Python 3.13/3.14, you may need to set the environment variable:

    export PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1
    cargo check --package rustychickpeas-python
    

    maturin builds automatically handle this via the py-limited-api = "auto" setting in pyproject.toml.

Quick Start

Python

Basic Usage (with Manager)

import rustychickpeas as rcp

# Create a manager for version management
manager = rcp.RustyChickpeas()

# Create a builder with a version
builder = manager.create_builder(version="v1.0")

# Add nodes with labels (node_id is optional - auto-generates if not provided)
builder.add_node(["Person"], node_id=1)
builder.add_node(["Person"], node_id=2)
builder.add_node(["Company"], node_id=3)

# Add relationships (node IDs are required)
builder.add_rel(1, 2, "KNOWS")
builder.add_rel(1, 3, "WORKS_FOR")

# Set properties (automatic type detection)
builder.set_prop(1, "name", "Alice")
builder.set_prop(1, "age", 30)
builder.set_prop(2, "name", "Bob")
builder.set_prop(3, "name", "Acme Corp")

# Finalize and add to manager (uses the version set when creating the builder)
builder.finalize_into(manager)

# Retrieve snapshot by version
graph = manager.graph_snapshot("v1.0")

# Get graph statistics
print(f"Graph has {graph.node_count()} nodes and {graph.relationship_count()} relationships")

# Query nodes
node = graph.node(1)  # Get Node object (node ID 1)
print(f"Node labels: {node.labels()}")  # ["Person"]
print(f"Node property 'name': {node.get_property('name')}")  # "Alice"

# Query relationships - get neighbor Node objects
neighbors = graph.neighbors(1, rcp.Direction.Outgoing)
print(f"Node 1 has {len(neighbors)} outgoing neighbors")
for neighbor in neighbors:
    print(f"  Neighbor ID: {neighbor.id()}, labels: {neighbor.labels()}")

# Or get just the neighbor IDs
neighbor_ids = graph.neighbor_ids(1, rcp.Direction.Outgoing)
print(f"Neighbor IDs: {neighbor_ids}")  # [2, 3]

# Get relationships as Relationship objects (includes type, start/end nodes)
relationships = graph.relationships(1, rcp.Direction.Outgoing)
for rel in relationships:
    print(f"  {rel.reltype()}: {rel.start_node().id()} -> {rel.end_node().id()}")

Standalone Usage (without Manager)

import rustychickpeas as rcp

# Create a standalone builder
builder = rcp.GraphSnapshotBuilder(version="v1.0")

# Add nodes and relationships
builder.add_node(["Person"], node_id=1)
builder.add_node(["Person"], node_id=2)
builder.add_rel(1, 2, "KNOWS")

# Finalize into a snapshot
graph = builder.finalize()

# Query the graph
node = graph.node(1)
neighbors = graph.neighbors(1, rcp.Direction.Outgoing)

Loading from Parquet Files

import rustychickpeas as rcp

# Direct loading (creates a standalone snapshot)
graph = rcp.GraphSnapshot.read_from_parquet(
    nodes_path="nodes.parquet",
    relationships_path="relationships.parquet",
    node_id_column="id",
    label_columns=["label"],
    start_node_column="from",
    end_node_column="to",
    rel_type_column="type"
)

# Or load into a builder for more control
manager = rcp.RustyChickpeas()
builder = manager.create_builder(version="v1.0")
builder.load_nodes_from_parquet("nodes.parquet", node_id_column="id", label_columns=["label"])
builder.load_relationships_from_parquet("relationships.parquet", 
                                       start_node_column="from", 
                                       end_node_column="to", 
                                       rel_type_column="type")
builder.finalize_into(manager)
graph = manager.graph_snapshot("v1.0")

Rust

use rustychickpeas_core::RustyChickpeas;

// Create a manager (handles multiple snapshots by version)
let manager = RustyChickpeas::new();

// Create a builder from the manager with version
let mut builder = manager.create_builder(Some("v1.0"), None, None);

// Add nodes and relationships (node_id is optional - auto-generates if None)
builder.add_node(Some(1), &["Person"]);
builder.add_node(Some(2), &["Person"]);
// Relationships require explicit node IDs
builder.add_rel(1, 2, "KNOWS");

// Finalize the builder
let snapshot = builder.finalize(None);

// Add to manager
manager.add_snapshot(snapshot);

// Retrieve the snapshot by version
let snapshot = manager.graph_snapshot("v1.0").unwrap();

// Query the snapshot
let neighbors = snapshot.out_neighbors(1);
println!("Node 1 neighbors: {:?}", neighbors);

Version Management

RustyChickpeas supports version management at the snapshot level using the RustyChickpeas manager. Each snapshot can have a version string (e.g., "v1.0", "v2.0") that identifies it.

Python

import rustychickpeas as rcp

# Create a manager
manager = rcp.RustyChickpeas()

# Create and build version 1.0
builder1 = manager.create_builder(version="v1.0")
# ... add nodes/relationships ...
builder1.finalize_into(manager)

# Create and build version 2.0
builder2 = manager.create_builder(version="v2.0")
# ... add nodes/relationships ...
builder2.finalize_into(manager)

# Retrieve snapshots by version
v1_snapshot = manager.graph_snapshot("v1.0")
v2_snapshot = manager.graph_snapshot("v2.0")

# List all versions
versions = manager.versions()  # ["v1.0", "v2.0"]

How Version Management Works

  1. Setting Version: Pass version parameter to create_builder() when creating the builder, or use GraphSnapshotBuilder.set_version(version_string) to set it later.

  2. Storage: After calling builder.finalize(), add the snapshot to the manager using manager.add_snapshot(snapshot). The snapshot will be stored under its version (if set) or "latest" if no version is set.

  3. Retrieval: Use manager.graph_snapshot(version_string) to retrieve a snapshot by version.

  4. Version Storage: Versions are stored as strings and can be any identifier you choose (e.g., "v1.0", "2024-01-01", "production").

Capacity and Auto-Growing

The capacity_nodes and capacity_rels parameters are optional performance hints for pre-allocation:

  • Defaults: If not specified, starts with 2^20 (1,048,576) nodes/relationships capacity
  • Auto-Growing: The builder automatically grows as needed (doubling capacity each time)
  • Maximum Limits:
    • Nodes: Up to 2^32 - 1 (4.3 billion) - enforced by u32 NodeId
    • Relationships: Up to 2^64 - 1 (18.4 quintillion) - limited by memory
  • When to Specify Capacity: Only specify if you know the approximate size upfront to avoid reallocations. For most use cases, the default auto-growing behavior is sufficient.

Example:

# Uses default (2^20 = 1,048,576), auto-grows as needed
builder = manager.create_builder(version="v1.0")

# Large graph - specify capacity hint to avoid reallocations
builder = manager.create_builder(version="v1.0", capacity_nodes=10_000_000, capacity_rels=50_000_000)

API Naming Conventions

RustyChickpeas follows Pythonic naming conventions with clear, descriptive method names:

# Get graph statistics
nodes = graph.node_count()
relationships = graph.relationship_count()

Performance

See rustychickpeas-core/benches/README.md for benchmark information.

For Python-specific performance tests, see rustychickpeas-python/tests/benchmark_large_parquet.py.

Highlights:

  • Node existence: ~7ns (constant time)
  • Property lookup: ~44ns (constant time)
  • Label queries: ~56ns for 100 nodes, ~348ns for 100K nodes
  • BFS traversal: ~130ns per node
  • Bitmap operations: Sub-microsecond for millions of elements

Limits and Scalability

Hard Limits

  • Nodes: Up to 2^32 - 1 (4,294,967,295 nodes) - Limited by u32 NodeId
  • Node IDs: Must be u32 (0 to 2^32 - 1) - Users should map their own IDs to u32 if needed
  • Relationships: Up to 2^64 - 1 (18,446,744,073,709,551,615) - Limited by u64 counter, but practically constrained by memory
  • Unique Strings (labels, relationship types, property keys): Up to 2^32 - 1 (4,294,967,295) - Limited by u32 InternedStringId
  • Properties per Node: No hard limit, constrained by available memory
  • Property Values: No hard limit, constrained by available memory

Note: While relationships can theoretically reach 2^64 - 1, practical limits are determined by available system memory. Each relationship requires storage in CSR arrays and indexes.

Tested Scales

1 Million Nodes + 1 Million Relationships - Tested and verified:

  • Memory usage: ~2.7GB (with string interning) to ~3.4GB (without)
  • Bulk load rate: ~4M entities/sec from Parquet files
  • Direct builder rate: 21-31M nodes/sec, 12-19M rels/sec
  • Query performance: Sub-millisecond for most operations
  • See rustychickpeas-python/tests/benchmark_large_parquet.py for large-scale benchmarks

Practical Considerations

Memory Usage:

  • Base overhead: ~3.5 bytes per node/relationship (structure + indexes)
  • Properties: Additional memory depends on property count and size
  • String interning: Reduces memory signficantly for graphs with high string duplication
  • Property value interning: Optional feature saves 32-50% memory when property values have high duplication

Performance Characteristics:

  • Direct Builder Operations: 21-31M nodes/sec, 12-19M rels/sec (exceeds 10M/sec target)
  • Bulk Loading: ~4M entities/sec from Parquet files (sequential processing, I/O bound)
  • Finalization: Parallelized using rayon for label/type indexes, property columns, and inverted indexes
  • Queries: Constant-time for indexed operations (label/type lookups)
  • Traversal: Linear with graph connectivity (BFS, shortest path)
  • Memory: Linear growth with graph size

Recommended Usage:

  • Small graphs (< 100K nodes): Excellent performance, minimal memory footprint
  • Medium graphs (100K - 10M nodes): Good performance, manageable memory requirements
  • Large graphs (10M+ nodes): Feasible but memory becomes the primary constraint; finalization benefits from parallelization
  • Very large graphs (100M+ nodes): Possible with sufficient RAM (100GB+); parallel finalization helps reduce finalization time

Memory Estimation:

Base memory ≈ (nodes + relationships) × 3.5 bytes
+ Properties (varies by property count/size)
+ String interning overhead (minimal)
- String interning savings (~21.5% if enabled)

Example: 1M nodes + 1M relationships with properties:

  • Without interning: ~3.4GB
  • With basic interning: ~2.7GB
  • With property value interning (50% reuse): ~1.4GB

Testing

Running Tests

Rust Tests:

cargo test --workspace

Python Tests:

cd rustychickpeas-python
pytest tests/

Test Coverage

Test coverage is set up for both Rust and Python:

Run Coverage:

./scripts/coverage.sh  # Linux/macOS
.\scripts\coverage.ps1  # Windows

Coverage Reports:

  • Rust: coverage/rust/tarpaulin-report.html
  • Python: coverage/python/htmlcov/index.html

See docs/COVERAGE.md for detailed coverage documentation.

License

Licensed under MIT license (LICENSE or http://opensource.org/licenses/MIT).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rustychickpeas-0.6.1-cp312-cp312-win_amd64.whl (6.2 MB view details)

Uploaded CPython 3.12Windows x86-64

rustychickpeas-0.6.1-cp312-cp312-manylinux_2_34_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

rustychickpeas-0.6.1-cp312-cp312-macosx_11_0_arm64.whl (6.4 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

rustychickpeas-0.6.1-cp311-cp311-win_amd64.whl (6.1 MB view details)

Uploaded CPython 3.11Windows x86-64

rustychickpeas-0.6.1-cp311-cp311-manylinux_2_34_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

rustychickpeas-0.6.1-cp311-cp311-macosx_11_0_arm64.whl (6.4 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

rustychickpeas-0.6.1-cp310-cp310-win_amd64.whl (6.1 MB view details)

Uploaded CPython 3.10Windows x86-64

rustychickpeas-0.6.1-cp310-cp310-manylinux_2_34_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

rustychickpeas-0.6.1-cp310-cp310-macosx_11_0_arm64.whl (6.4 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file rustychickpeas-0.6.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for rustychickpeas-0.6.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 73906d1d5f5f7ae9c8e5dd8630753686ae9d6449eb5e12cedcf42f590f249374
MD5 142c84b4606d696b4f446a140c1c7ae7
BLAKE2b-256 f3acd7212fc23694d53dee15105a9d8d2d92535bcbcc2373faf1e14151c6da2e

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustychickpeas-0.6.1-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on freeeve/rustychickpeas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustychickpeas-0.6.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rustychickpeas-0.6.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 774a5aa07f4719d773535e86bfbeb36e66e21b403c54baafb5655ab338ed7f5f
MD5 ddd1253d28f22abb6872f46babae217f
BLAKE2b-256 2b717201c1db87098a629c9a8ca236bb1bd2a0855ef28abfa4e1e7dd962325fb

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustychickpeas-0.6.1-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on freeeve/rustychickpeas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustychickpeas-0.6.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rustychickpeas-0.6.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fa455583f1757783b44b260674d194498a77731f34f9343d5c3f52239a7e67d0
MD5 1a0fcc658fd87736facf6f1803ce0ad2
BLAKE2b-256 01a57ba3e27da8e09325e92a1b8c1bdd41e671ec83859f60a5422c16911b049f

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustychickpeas-0.6.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on freeeve/rustychickpeas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustychickpeas-0.6.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for rustychickpeas-0.6.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 987489370df573b949733c31f4ce736322ce6e7c596cfdf5622238638f15705a
MD5 63df1d45e0d76c15d9cdbef3bd32cce1
BLAKE2b-256 96072a79f8ec92b2a895e64f4fef305f677098268c3da62b1fa42fb77c7d4c55

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustychickpeas-0.6.1-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on freeeve/rustychickpeas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustychickpeas-0.6.1-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rustychickpeas-0.6.1-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e5e054d045e9102645244ccb779c0d98f24bac376b00987f29dd182635bfe314
MD5 32a50fac64f1b651f6aa8c2e74d25997
BLAKE2b-256 22561c4edf539a7c0f2f02144d5114175ab246c780de96e815ebe52018f89c1f

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustychickpeas-0.6.1-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on freeeve/rustychickpeas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustychickpeas-0.6.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rustychickpeas-0.6.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0b0b4c2b96a5a5e9bf3192842f332f57f36a7b3a4b8ad064d6c6e573fb8e8b1c
MD5 dff03201bbb2298b20bba8e880490a73
BLAKE2b-256 d83006f94db02d03f4f6020ced2112ee13edf5fb5f77ce6a6cd9a8afba0265a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustychickpeas-0.6.1-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on freeeve/rustychickpeas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustychickpeas-0.6.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for rustychickpeas-0.6.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 5cc6be6eb8574950a1b65ce3c4ed2ba599d121327b41f2111d86d8a99f4d6d72
MD5 c18c1d97f543ce12cd37e968ac56fc5f
BLAKE2b-256 d9da1d62c3586cb251edd4d98f322a9144c3fe739c553c25ee10155a66113709

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustychickpeas-0.6.1-cp310-cp310-win_amd64.whl:

Publisher: publish.yml on freeeve/rustychickpeas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustychickpeas-0.6.1-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rustychickpeas-0.6.1-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 bc02ce033355bf7d30fc96b07a63efec52a1de744b6f6f23154d610c187a51d7
MD5 6dac514487b6c57118636b232cfb39d4
BLAKE2b-256 d52f44290335e7e616dedfd43e995075b2cb2c1cd29f26f57564aac069d5a8f0

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustychickpeas-0.6.1-cp310-cp310-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on freeeve/rustychickpeas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustychickpeas-0.6.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rustychickpeas-0.6.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d0ae592187398a9d723562e4c6c58433803d54191b701616a9de11d4442e8bfd
MD5 2b32182f9b74861502824dec3b0cce52
BLAKE2b-256 09312fb9396aa01805e588594b5e0ef27dcaf3a58021ec5cc4d07bfa63e50e28

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustychickpeas-0.6.1-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: publish.yml on freeeve/rustychickpeas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page