Skip to main content

High-performance in-memory graph API with RoaringBitmaps, Python bindings via PyO3

Project description

RustyChickpeas

Coverage Status

RustyChickpeas Logo

An in-memory graph API written in Rust, using RoaringBitmaps as a fundamental data structure.

Overview

RustyChickpeas provides a high-performance, in-memory graph database API that:

  • Uses RoaringBitmaps: Efficient set operations for node/relationship IDs
  • Rust-First: High-performance core implementation
  • Python-Friendly: PyO3 bindings for seamless Python integration
  • In-Memory: Optimized for fast graph operations

Project Status

Implementation Complete - Core functionality implemented with comprehensive benchmarks and Python bindings.

  • ✅ Immutable GraphSnapshot with CSR adjacency and columnar properties
  • ✅ GraphSnapshotBuilder for efficient bulk loading (requires u32 node IDs)
  • ✅ Parallel finalization using rayon for improved performance
  • ✅ Python bindings via PyO3
  • ✅ Bulk loading from Parquet files

See rustychickpeas-core/benches/README.md for benchmark information.

Key Features

  • Immutable GraphSnapshot - Read-optimized graph with CSR adjacency and columnar properties
  • GraphSnapshotBuilder - Efficient bulk loading (requires u32 node IDs directly)
  • RustyChickpeas Manager - Version management for multiple graph snapshots
  • Parallel Finalization - Uses rayon to parallelize index building during finalization
  • ✅ Label and property support with inverted indexes
  • ✅ Efficient traversal using CSR (Compressed Sparse Row) format
  • ✅ Python bindings via PyO3
  • ✅ Bulk loading from Parquet files

Installation

pip install rustychickpeas

Or from source:

git clone https://github.com/freeeve/rustychickpeas.git
cd rustychickpeas/rustychickpeas-python
pip install maturin
maturin develop --release

Platform Support

RustyChickpeas is tested and supported on:

  • Linux x86_64 - Full support
  • macOS x86_64 (Intel) - Full support
  • macOS arm64 (Apple Silicon) - Full support
  • Windows x86_64 - Full support

Limitations:

  • ⚠️ Linux aarch64 (ARM servers, e.g., AWS Graviton) - Not currently tested in CI. The Rust core should compile and work, but Python bindings require native runners or complex cross-compilation setup. Contributions welcome!

Python Version Support

RustyChickpeas requires Python >= 3.10 and is tested against Python 3.10, 3.11, 3.12, 3.13, and 3.14.

Supported Python Versions by Platform

Platform Python Versions
Linux x86_64 3.10, 3.11, 3.12, 3.13, 3.14
macOS x86_64 (Intel) 3.11, 3.12, 3.13, 3.14
macOS arm64 (Apple Silicon) 3.10, 3.11, 3.12, 3.13, 3.14
Windows x86_64 3.10, 3.11, 3.12, 3.13, 3.14

Note: Python 3.10 is not available on macOS x86_64 runners due to dependency issues with the GitHub Actions environment. Python 3.10 works fine on macOS arm64 (native runners) and all other platforms.

PyO3 Compatibility

  • Python 3.10-3.12: Fully supported by PyO3 0.20.3

  • Python 3.13-3.14: Uses stable ABI compatibility mode (PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1) since PyO3 0.20.3 doesn't officially support these versions yet. This works correctly but uses the stable ABI for forward compatibility.

    Note for local development: When using cargo check or cargo build with Python 3.13/3.14, you may need to set the environment variable:

    export PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1
    cargo check --package rustychickpeas-python
    

    maturin builds automatically handle this via the py-limited-api = "auto" setting in pyproject.toml.

Quick Start

Python

Basic Usage (with Manager)

import rustychickpeas as rcp

# Create a manager for version management
manager = rcp.RustyChickpeas()

# Create a builder with a version
builder = manager.create_builder(version="v1.0")

# Add nodes with labels (node_id is optional - auto-generates if not provided)
builder.add_node(["Person"], node_id=1)
builder.add_node(["Person"], node_id=2)
builder.add_node(["Company"], node_id=3)

# Add relationships (node IDs are required)
builder.add_rel(1, 2, "KNOWS")
builder.add_rel(1, 3, "WORKS_FOR")

# Set properties (automatic type detection)
builder.set_prop(1, "name", "Alice")
builder.set_prop(1, "age", 30)
builder.set_prop(2, "name", "Bob")
builder.set_prop(3, "name", "Acme Corp")

# Finalize and add to manager (uses the version set when creating the builder)
builder.finalize_into(manager)

# Retrieve snapshot by version
graph = manager.graph_snapshot("v1.0")

# Get graph statistics
print(f"Graph has {graph.node_count()} nodes and {graph.relationship_count()} relationships")

# Query nodes
node = graph.node(1)  # Get Node object (node ID 1)
print(f"Node labels: {node.labels()}")  # ["Person"]
print(f"Node property 'name': {node.get_property('name')}")  # "Alice"

# Query relationships - get neighbor Node objects
neighbors = graph.neighbors(1, rcp.Direction.Outgoing)
print(f"Node 1 has {len(neighbors)} outgoing neighbors")
for neighbor in neighbors:
    print(f"  Neighbor ID: {neighbor.id()}, labels: {neighbor.labels()}")

# Or get just the neighbor IDs
neighbor_ids = graph.neighbor_ids(1, rcp.Direction.Outgoing)
print(f"Neighbor IDs: {neighbor_ids}")  # [2, 3]

# Get relationships as Relationship objects (includes type, start/end nodes)
relationships = graph.relationships(1, rcp.Direction.Outgoing)
for rel in relationships:
    print(f"  {rel.reltype()}: {rel.start_node().id()} -> {rel.end_node().id()}")

Standalone Usage (without Manager)

import rustychickpeas as rcp

# Create a standalone builder
builder = rcp.GraphSnapshotBuilder(version="v1.0")

# Add nodes and relationships
builder.add_node(["Person"], node_id=1)
builder.add_node(["Person"], node_id=2)
builder.add_rel(1, 2, "KNOWS")

# Finalize into a snapshot
graph = builder.finalize()

# Query the graph
node = graph.node(1)
neighbors = graph.neighbors(1, rcp.Direction.Outgoing)

Loading from Parquet Files

import rustychickpeas as rcp

# Direct loading (creates a standalone snapshot)
graph = rcp.GraphSnapshot.read_from_parquet(
    nodes_path="nodes.parquet",
    relationships_path="relationships.parquet",
    node_id_column="id",
    label_columns=["label"],
    start_node_column="from",
    end_node_column="to",
    rel_type_column="type"
)

# Or load into a builder for more control
manager = rcp.RustyChickpeas()
builder = manager.create_builder(version="v1.0")
builder.load_nodes_from_parquet("nodes.parquet", node_id_column="id", label_columns=["label"])
builder.load_relationships_from_parquet("relationships.parquet", 
                                       start_node_column="from", 
                                       end_node_column="to", 
                                       rel_type_column="type")
builder.finalize_into(manager)
graph = manager.graph_snapshot("v1.0")

Rust

use rustychickpeas_core::RustyChickpeas;

// Create a manager (handles multiple snapshots by version)
let manager = RustyChickpeas::new();

// Create a builder from the manager with version
let mut builder = manager.create_builder(Some("v1.0"), None, None);

// Add nodes and relationships (node_id is optional - auto-generates if None)
builder.add_node(Some(1), &["Person"]);
builder.add_node(Some(2), &["Person"]);
// Relationships require explicit node IDs
builder.add_rel(1, 2, "KNOWS");

// Finalize the builder
let snapshot = builder.finalize(None);

// Add to manager
manager.add_snapshot(snapshot);

// Retrieve the snapshot by version
let snapshot = manager.graph_snapshot("v1.0").unwrap();

// Query the snapshot
let neighbors = snapshot.out_neighbors(1);
println!("Node 1 neighbors: {:?}", neighbors);

Version Management

RustyChickpeas supports version management at the snapshot level using the RustyChickpeas manager. Each snapshot can have a version string (e.g., "v1.0", "v2.0") that identifies it.

Python

import rustychickpeas as rcp

# Create a manager
manager = rcp.RustyChickpeas()

# Create and build version 1.0
builder1 = manager.create_builder(version="v1.0")
# ... add nodes/relationships ...
builder1.finalize_into(manager)

# Create and build version 2.0
builder2 = manager.create_builder(version="v2.0")
# ... add nodes/relationships ...
builder2.finalize_into(manager)

# Retrieve snapshots by version
v1_snapshot = manager.graph_snapshot("v1.0")
v2_snapshot = manager.graph_snapshot("v2.0")

# List all versions
versions = manager.versions()  # ["v1.0", "v2.0"]

How Version Management Works

  1. Setting Version: Pass version parameter to create_builder() when creating the builder, or use GraphSnapshotBuilder.set_version(version_string) to set it later.

  2. Storage: After calling builder.finalize(), add the snapshot to the manager using manager.add_snapshot(snapshot). The snapshot will be stored under its version (if set) or "latest" if no version is set.

  3. Retrieval: Use manager.graph_snapshot(version_string) to retrieve a snapshot by version.

  4. Version Storage: Versions are stored as strings and can be any identifier you choose (e.g., "v1.0", "2024-01-01", "production").

Capacity and Auto-Growing

The capacity_nodes and capacity_rels parameters are optional performance hints for pre-allocation:

  • Defaults: If not specified, starts with 2^20 (1,048,576) nodes/relationships capacity
  • Auto-Growing: The builder automatically grows as needed (doubling capacity each time)
  • Maximum Limits:
    • Nodes: Up to 2^32 - 1 (4.3 billion) - enforced by u32 NodeId
    • Relationships: Up to 2^64 - 1 (18.4 quintillion) - limited by memory
  • When to Specify Capacity: Only specify if you know the approximate size upfront to avoid reallocations. For most use cases, the default auto-growing behavior is sufficient.

Example:

# Uses default (2^20 = 1,048,576), auto-grows as needed
builder = manager.create_builder(version="v1.0")

# Large graph - specify capacity hint to avoid reallocations
builder = manager.create_builder(version="v1.0", capacity_nodes=10_000_000, capacity_rels=50_000_000)

API Naming Conventions

RustyChickpeas follows Pythonic naming conventions with clear, descriptive method names:

# Get graph statistics
nodes = graph.node_count()
relationships = graph.relationship_count()

Performance

See rustychickpeas-core/benches/README.md for benchmark information.

For Python-specific performance tests, see rustychickpeas-python/tests/benchmark_large_parquet.py.

Highlights:

  • Node existence: ~7ns (constant time)
  • Property lookup: ~44ns (constant time)
  • Label queries: ~56ns for 100 nodes, ~348ns for 100K nodes
  • BFS traversal: ~130ns per node
  • Bitmap operations: Sub-microsecond for millions of elements

Limits and Scalability

Hard Limits

  • Nodes: Up to 2^32 - 1 (4,294,967,295 nodes) - Limited by u32 NodeId
  • Node IDs: Must be u32 (0 to 2^32 - 1) - Users should map their own IDs to u32 if needed
  • Relationships: Up to 2^64 - 1 (18,446,744,073,709,551,615) - Limited by u64 counter, but practically constrained by memory
  • Unique Strings (labels, relationship types, property keys): Up to 2^32 - 1 (4,294,967,295) - Limited by u32 InternedStringId
  • Properties per Node: No hard limit, constrained by available memory
  • Property Values: No hard limit, constrained by available memory

Note: While relationships can theoretically reach 2^64 - 1, practical limits are determined by available system memory. Each relationship requires storage in CSR arrays and indexes.

Tested Scales

1 Million Nodes + 1 Million Relationships - Tested and verified:

  • Memory usage: ~2.7GB (with string interning) to ~3.4GB (without)
  • Bulk load rate: ~4M entities/sec from Parquet files
  • Direct builder rate: 21-31M nodes/sec, 12-19M rels/sec
  • Query performance: Sub-millisecond for most operations
  • See rustychickpeas-python/tests/benchmark_large_parquet.py for large-scale benchmarks

Practical Considerations

Memory Usage:

  • Base overhead: ~3.5 bytes per node/relationship (structure + indexes)
  • Properties: Additional memory depends on property count and size
  • String interning: Reduces memory signficantly for graphs with high string duplication
  • Property value interning: Optional feature saves 32-50% memory when property values have high duplication

Performance Characteristics:

  • Direct Builder Operations: 21-31M nodes/sec, 12-19M rels/sec (exceeds 10M/sec target)
  • Bulk Loading: ~4M entities/sec from Parquet files (sequential processing, I/O bound)
  • Finalization: Parallelized using rayon for label/type indexes, property columns, and inverted indexes
  • Queries: Constant-time for indexed operations (label/type lookups)
  • Traversal: Linear with graph connectivity (BFS, shortest path)
  • Memory: Linear growth with graph size

Recommended Usage:

  • Small graphs (< 100K nodes): Excellent performance, minimal memory footprint
  • Medium graphs (100K - 10M nodes): Good performance, manageable memory requirements
  • Large graphs (10M+ nodes): Feasible but memory becomes the primary constraint; finalization benefits from parallelization
  • Very large graphs (100M+ nodes): Possible with sufficient RAM (100GB+); parallel finalization helps reduce finalization time

Memory Estimation:

Base memory ≈ (nodes + relationships) × 3.5 bytes
+ Properties (varies by property count/size)
+ String interning overhead (minimal)
- String interning savings (~21.5% if enabled)

Example: 1M nodes + 1M relationships with properties:

  • Without interning: ~3.4GB
  • With basic interning: ~2.7GB
  • With property value interning (50% reuse): ~1.4GB

Testing

Running Tests

Rust Tests:

cargo test --workspace

Python Tests:

cd rustychickpeas-python
pytest tests/

Test Coverage

Test coverage is set up for both Rust and Python:

Run Coverage:

./scripts/coverage.sh  # Linux/macOS
.\scripts\coverage.ps1  # Windows

Coverage Reports:

  • Rust: coverage/rust/tarpaulin-report.html
  • Python: coverage/python/htmlcov/index.html

See docs/COVERAGE.md for detailed coverage documentation.

License

Licensed under MIT license (LICENSE or http://opensource.org/licenses/MIT).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rustychickpeas-0.6.0-cp311-cp311-win_amd64.whl (6.1 MB view details)

Uploaded CPython 3.11Windows x86-64

rustychickpeas-0.6.0-cp311-cp311-manylinux_2_34_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

rustychickpeas-0.6.0-cp311-cp311-macosx_11_0_arm64.whl (6.4 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

rustychickpeas-0.6.0-cp310-cp310-win_amd64.whl (6.1 MB view details)

Uploaded CPython 3.10Windows x86-64

rustychickpeas-0.6.0-cp310-cp310-manylinux_2_34_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

rustychickpeas-0.6.0-cp310-cp310-macosx_11_0_arm64.whl (6.4 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file rustychickpeas-0.6.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for rustychickpeas-0.6.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 a3dce361b8d378e702483b7429f111002dce80500c6e6149901fab9834199ae6
MD5 8aa10e09f3d643b3690f789e518e84ab
BLAKE2b-256 aad10325b6c67d7c292677b19dbcab73ead0d91c505f7ff0726255772e3dc1a7

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustychickpeas-0.6.0-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on freeeve/rustychickpeas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustychickpeas-0.6.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rustychickpeas-0.6.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 efe3bc085a2184424a3824248900e56d5816bf2061a18aae32548a99dbf10449
MD5 e9b9f2d7fc21f12ddfe0066e283fbf7a
BLAKE2b-256 b0911d50d5c7583efeeafba8eb1b97a7345479fb20cd1dd9084de9f10e5872c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustychickpeas-0.6.0-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on freeeve/rustychickpeas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustychickpeas-0.6.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rustychickpeas-0.6.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2e0ac0152d74a2d8b3ef0e34dddd7e09bb68be81b71f03c982ce909f4e024a10
MD5 c52127885cab44eaf6e0c561d3693359
BLAKE2b-256 3c01bef202930781d268f7a22d66c78312351055ac60809eb49d395718301660

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustychickpeas-0.6.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on freeeve/rustychickpeas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustychickpeas-0.6.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for rustychickpeas-0.6.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 2bdd7e80ed876ec9ab6362c8c34d6926bf2728c452bef5c63e751ed7fcdc133f
MD5 2f186f8c4d86ff4cd33e70a7f792267b
BLAKE2b-256 9931742ee19828da0e25d391ee206d3207e0eafbc8b5a61bf0c20817f7262cdb

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustychickpeas-0.6.0-cp310-cp310-win_amd64.whl:

Publisher: publish.yml on freeeve/rustychickpeas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustychickpeas-0.6.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rustychickpeas-0.6.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 bee52164582c03696387fda88dd4e51328c42337ff2fc6f151104eb1442f8cec
MD5 9a6a970d6f3e426b4ae6f86c70d8232e
BLAKE2b-256 5262d453f1aafdc21e83659c7764d19a5f499798bae92f75fbd2fe58158d7347

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustychickpeas-0.6.0-cp310-cp310-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on freeeve/rustychickpeas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustychickpeas-0.6.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rustychickpeas-0.6.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1f67a7929d46a832bea92c032dfdba7dea159479aa9d771db03d47633d5e7cd6
MD5 ef1c249a7d1f140c4a29749401471548
BLAKE2b-256 f57a7b994e3c70fc3059626f126cc68f573da3fc2aa47471f4082e0c486081a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustychickpeas-0.6.0-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: publish.yml on freeeve/rustychickpeas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page