High-performance embedded database for semantic search, graph queries, and structured data

These details have not been verified by PyPI

Project description

REM Database

Resources-Entities-Moments (REM): High-performance embedded database for semantic search, graph queries, and structured data.

Project Location: /Users/sirsh/code/percolation/percolate-rocks

Project Goals

Build a production-ready database combining:

Rust performance - HNSW vector search (200x faster), native SQL execution (5-10x faster)
Python ergonomics - Pydantic models drive schemas, natural language queries
Zero impedance - Pydantic json_schema_extra → automatic embeddings, indexing, validation

Quick Start

Installation

From PyPI (published v0.2.0):

pip install percolate-rocks

From source:

cd /Users/sirsh/code/percolation/percolate-rocks
maturin develop --release

See .release-notes/ for release history.

Building

This project supports two build modes:

Python extension (default):

# Build and install Python package
maturin develop

# Syntax check only (faster)
maturin develop --skip-install

# Note: cargo check/build will fail - use maturin for Python extensions

Standalone Rust library (no Python):

# Build as pure Rust library (no Python bindings)
cargo check --lib --no-default-features
cargo build --lib --no-default-features --release

# Run tests without Python
cargo test --lib --no-default-features

# Use in other Rust projects
# Add to Cargo.toml:
# percolate-rocks = { version = "0.1", default-features = false }

Testing cross-compilation locally

To ensure your local builds will work in CI/GitHub Actions, use Docker to replicate the CI environment:

# Test ARM64 Linux cross-compilation (what CI does)
docker run --rm -v "$(pwd):/workspace" -w /workspace rust:latest bash -c "
  apt-get update && apt-get install -y gcc-aarch64-linux-gnu pkg-config libclang-dev
  rustup target add aarch64-unknown-linux-gnu
  cargo build --target aarch64-unknown-linux-gnu --release
"

Why local builds might work but CI fails:

Environment	Rust Version	Target	OpenSSL	Why It Works
Your Mac	Latest (1.87+)	Native (aarch64-apple-darwin)	System OpenSSL (Homebrew)	Local system libs
GitHub Actions	Workflow-specified	Cross-compile (aarch64-unknown-linux-gnu)	Vendored (native-tls-vendored feature)	Must compile from source

Key differences:

Rust version: Mac typically has latest via rustup, CI uses workflow-pinned version
Cross-compilation: Mac → Linux ARM64 requires vendored dependencies (no system libs available)
Native TLS: reqwest needs native-tls-vendored feature for cross-compilation

If Docker build passes → CI will pass. This is your local validation gate.

Basic Workflow

Define your schema using Pydantic (in models.py):

from pydantic import BaseModel, Field, ConfigDict

class Article(BaseModel):
    """Article resource for semantic search."""
    title: str = Field(description="Article title")
    content: str = Field(description="Full article content")
    category: str = Field(description="Content category")

    model_config = ConfigDict(
        json_schema_extra={
            "embedding_fields": ["content"],      # Auto-embed on insert
            "indexed_fields": ["category"],       # Fast WHERE queries
            "key_field": "title"                  # Deterministic UUID
        }
    )

Use the CLI to work with your data:

# 1. Generate encryption key (for encryption at rest)
rem key-gen --password "strong_master_password"
# Generates Ed25519 key pair and stores encrypted at ~/.p8/keys/

# 2. Initialize database (defaults to ~/.p8/db/)
rem init

# Or specify custom path
rem init --path ./data

# With encryption at rest (optional)
rem init --path ./data --password "strong_master_password"

# 3. Register schema (JSON/YAML preferred, Python also supported)
rem schema add schema.json  # Preferred: pure JSON Schema
rem schema add schema.yaml  # Preferred: YAML format
rem schema add models.py::Article  # Also supported: Pydantic model

# Or create from template
rem schema add --name my_docs --template resources  # Clone resources schema

# 4. Batch upsert articles (single embedding API call)
cat articles.jsonl | rem insert articles --batch

# 5. Semantic search (HNSW index)
rem search "fast programming languages" --schema=articles --top-k=5

# 6. SQL queries (indexed)
rem query "SELECT * FROM articles WHERE category = 'programming'"

CLI Commands

Setup and Schema Management

Command	Description	Example
`rem key-gen`	Generate encryption key pair (Ed25519)	`rem key-gen --password "strong_password"` (saves to `~/.p8/keys/`)
`rem init`	Initialize database (default: `~/.p8/db/`)	`rem init` or `rem init --path ./data` or `rem init --password "..."` (encryption at rest)
`rem schema add <file>`	Register schema (JSON/YAML preferred)	`rem schema add schema.json` or `rem schema add models.py::Article`
`rem schema add --name <name> --template <template>`	Create schema from built-in template	`rem schema add --name my_docs --template resources`
`rem schema list`	List registered schemas	`rem schema list`
`rem schema show <name>`	Show schema definition	`rem schema show articles`
`rem schema templates`	List available templates	`rem schema templates`

Schema template workflow:

# List available templates
rem schema templates
# Output:
# Available schema templates:
# - resources: Chunked documents with embeddings (URI-based)
# - entities: Generic structured data (name-based)
# - agentlets: AI agent definitions (with tools/resources)
# - moments: Temporal classifications (time-range queries)

# Create new schema from template
rem schema add --name my_documents --template resources

# This creates and registers:
# - Schema name: my_documents
# - Clones all fields from resources template
# - Updates fully_qualified_name: "user.my_documents"
# - Updates short_name: "my_documents"
# - Preserves embedding/indexing configuration

# Customize the generated schema (optional)
rem schema show my_documents > my_documents.json
# Edit my_documents.json
rem schema add my_documents.json  # Re-register with changes

# Or save to file without registering
rem schema add --name my_docs --template resources --output my_docs.yaml
# Edit my_docs.yaml
rem schema add my_docs.yaml  # Register when ready

Built-in templates:

Template	Use Case	Key Fields	Configuration
`resources`	Documents, articles, PDFs	`name`, `content`, `uri`, `chunk_ordinal`	Embeds `content`, indexes `content_type`, key: `uri`
`entities`	Generic structured data	`name`, `key`, `properties`	Indexes `name`, key: `name`
`agentlets`	AI agent definitions	`description`, `tools`, `resources`	Embeds `description`, includes MCP config
`moments`	Temporal events	`name`, `start_time`, `end_time`, `classifications`	Indexes `start_time`, `end_time`

Example: Creating custom document schema

# Start with resources template
rem schema add --name technical_docs --template resources --output technical_docs.yaml

# Edit technical_docs.yaml to add custom fields:
# - difficulty_level: enum["beginner", "intermediate", "advanced"]
# - language: string
# - code_examples: array[object]

# Register customized schema
rem schema add technical_docs.yaml

# Insert documents
cat docs.jsonl | rem insert technical_docs --batch

Data Operations

Command	Description	Example
`rem insert <table> <json>`	Insert entity	`rem insert articles '{"title": "..."}`
`rem insert <table> --batch`	Batch insert from stdin	`cat data.jsonl \| rem insert articles --batch`
`rem ingest <file>`	Upload and chunk file	`rem ingest tutorial.pdf --schema=articles`
`rem get <uuid>`	Get entity by ID	`rem get 550e8400-...`
`rem lookup <key>`	Global key lookup	`rem lookup "Python Guide"`

Search and Queries

Command	Description	Example
`rem search <query>`	Semantic search	`rem search "async programming" --schema=articles`
`rem query "<SQL>"`	SQL query	`rem query "SELECT * FROM articles WHERE category = 'tutorial'"`
`rem ask "<question>"`	Natural language query (executes)	`rem ask "show recent programming articles"`
`rem ask "<question>" --plan`	Show query plan without executing	`rem ask "show recent articles" --plan`
`rem traverse <uuid>`	Graph traversal	`rem traverse <id> --depth=2 --direction=out`

Natural language query examples:

# Execute query immediately
rem ask "show recent programming articles"
# Output: Query results as JSON

# Show query plan without executing (LLM response only)
rem ask "show recent programming articles" --plan
# Output:
# {
#   "confidence": 0.95,
#   "query": "SELECT * FROM articles WHERE category = 'programming' ORDER BY created_at DESC LIMIT 10",
#   "reasoning": "User wants recent articles filtered by programming category",
#   "requires_search": false
# }

# Complex query with semantic search
rem ask "find articles about Rust performance optimization" --plan
# Output:
# {
#   "confidence": 0.85,
#   "query": "SEARCH articles 'Rust performance optimization' LIMIT 10",
#   "reasoning": "Semantic search needed for conceptual similarity",
#   "requires_search": true
# }

Export and Analytics

Command	Description	Example
`rem export <table>`	Export to Parquet	`rem export articles --output ./data.parquet`
`rem export --all`	Export all schemas	`rem export --all --output ./exports/`

REM Dreaming (Background Intelligence)

Command	Description	Example
`rem dream`	Run dreaming with default lookback (24h)	`rem dream`
`rem dream --lookback-hours <N>`	Custom lookback window	`rem dream --lookback-hours 168` (weekly)
`rem dream --dry-run`	Show what would be generated	`rem dream --dry-run --verbose`
`rem dream --llm <model>`	Specify LLM provider	`rem dream --llm gpt-4-turbo`
`rem dream --start <date> --end <date>`	Specific date range	`rem dream --start "2025-10-20" --end "2025-10-25"`

REM Dreaming uses LLMs to analyze your activity in the background and generate:

Moments: Temporal classifications of what you were working on (with emotions, topics, outcomes)
Summaries: Period recaps and key insights
Graph edges: Automatic connections between related resources and sessions
Ontological maps: Topic relationships and themes

See docs/rem-dreaming.md for detailed documentation.

Core System Schemas

REM Database includes three core schemas for tracking user activity:

Sessions

Purpose: Track conversation sessions with AI agents.

Key fields:

id (UUID) - Session identifier
case_id (UUID) - Optional link to project/case
user_id (string) - User identifier
metadata (object) - Session context

Schema: schema/core/sessions.json

Messages

Purpose: Individual messages within sessions (user queries, AI responses, tool calls).

Key fields:

session_id (UUID) - Parent session
role (enum) - user | assistant | system | tool
content (string) - Message content (embedded for search)
tool_calls (array) - Tool invocations
trace_id, span_id (string) - Observability

Schema: schema/core/messages.json

Moments

Purpose: Temporal classifications generated by REM Dreaming.

Key fields:

name (string) - Moment title
summary (string) - Activity description
start_time, end_time (datetime) - Time bounds
moment_type (enum) - work_session | learning | planning | communication | reflection | creation
tags (array) - Topic tags (e.g., ["rust", "database", "performance"])
emotion_tags (array) - Emotion/tone tags (e.g., ["focused", "productive"])
people (array) - People mentioned
resource_ids, session_ids (arrays) - Related entities

Schema: schema/core/moments.json

These schemas are registered automatically on rem init.

Peer Replication Testing

REM supports primary/replica replication via WAL and gRPC streaming.

Terminal 1: Primary Node

# Start primary with WAL enabled
export P8_REPLICATION_MODE=primary
export P8_REPLICATION_PORT=50051
export P8_WAL_ENABLED=true
export P8_DB_PATH=./data/primary  # Override default ~/.p8/db/

rem init
# Register schema (JSON/YAML preferred)
rem schema add schema.json

# Start replication server
rem serve --host 0.0.0.0 --port 50051

# Insert data (will be replicated)
rem insert articles '{"title": "Doc 1", "content": "Test replication", "category": "test"}'

# Check WAL status
rem replication wal-status
# Output:
# WAL sequence: 1
# Entries: 1
# Size: 512 bytes

Terminal 2: Replica 1

# Start replica pointing to primary
export P8_REPLICATION_MODE=replica
export P8_PRIMARY_HOST=localhost:50051
export P8_DB_PATH=./data/replica1  # Override default ~/.p8/db/

rem init

# Connect and sync from primary
rem replicate --primary=localhost:50051 --follow

# Check replication status
rem replication status
# Output:
# Mode: replica
# Primary: localhost:50051
# WAL position: 1
# Lag: 2ms
# Status: synced

# Query replica (read-only)
rem query "SELECT * FROM articles"
# Output: Same data as primary

Terminal 3: Replica 2

export P8_REPLICATION_MODE=replica
export P8_PRIMARY_HOST=localhost:50051
export P8_DB_PATH=./data/replica2  # Override default ~/.p8/db/

rem init
rem replicate --primary=localhost:50051 --follow

# Verify sync
rem query "SELECT COUNT(*) FROM articles"
# Output: 1

Testing Failover

Terminal 1: Simulate Primary Failure

^C  # Stop primary server

Terminal 2: Replica Behavior During Outage

# Replica continues serving reads
rem query "SELECT * FROM articles"
# Output: Cached data still available

# Check status
rem replication status
# Output:
# Status: disconnected
# Last sync: 45s ago
# Buffered writes: 0 (read-only)

Terminal 1: Primary Restart

# Restart primary and insert new data
rem serve --host 0.0.0.0 --port 50051
rem insert articles '{"title": "Doc 2", "content": "After restart", "category": "test"}'

Terminal 2: Automatic Catchup

# Replica auto-reconnects and syncs
rem replication status
# Output:
# Status: synced
# Catchup: completed (1 entry, 50ms)
# Lag: 3ms

# Verify new data
rem query "SELECT title FROM articles ORDER BY created_at DESC LIMIT 1"
# Output: Doc 2

Key Implementation Conventions

REM Principle

Resources-Entities-Moments is a unified data model, not separate storage:

Resources: Chunked documents with embeddings → semantic search (HNSW)
Entities: Structured data → SQL queries (indexed fields)
Moments: Temporal classifications → time-range queries

All stored as entities in RocksDB. Conceptual distinction only.

Pydantic-Driven Everything

Configuration flows from json_schema_extra:

NB!: While we support adding metadata in config. Fields can also take properties like key-field and embedding_provider as json schema extra and is preferred.

model_config = ConfigDict(
    json_schema_extra={
        "embedding_fields": ["content"],      # → Auto-embed on insert
        "indexed_fields": ["category"],       # → RocksDB index CF
        "key_field": "title"                  # → Deterministic UUID
    }
)

NB: Rust can also define schema in equivalent mode classes or schema but we drive things with pydantic aware semantics of the json schema format.

Deterministic UUIDs (Idempotent Inserts)

NB: Precedence; uri -> key -> name unless specified in config.

Priority	Field	UUID Generation
1	`uri`	`blake3(entity_type + uri + chunk_ordinal)`
2	`json_schema_extra.key_field`	`blake3(entity_type + value)`
3	`key`	`blake3(entity_type + key)`
4	`name`	`blake3(entity_type + name)`
5	(fallback)	`UUID::v4()` (random)

Same key → same UUID → upsert semantics.

System Fields (Always Auto-Added)

Never define these in Pydantic models - always added by database:

id (UUID) - Deterministic or random
entity_type (string) - Schema/table name
created_at, modified_at, deleted_at (ISO 8601) - Timestamps
edges (array[string]) - Graph relationships

Embedding Fields (Conditionally Added)

Not system fields - only added when configured:

embedding (array[float32]) - Added if embedding_fields in json_schema_extra
embedding_alt (array[float32]) - Added if P8_ALT_EMBEDDING environment variable set

# Configuration that triggers embedding generation:
model_config = ConfigDict(
    json_schema_extra={
        "embedding_fields": ["content"],      # → Adds "embedding" field
        "embedding_provider": "default"       # → Uses P8_DEFAULT_EMBEDDING
    }
)

Encryption at Rest

Optional encryption at rest using Ed25519 key pairs and ChaCha20-Poly1305 AEAD:

Generate key pair (one-time setup):

rem key-gen --password "strong_master_password"
# Stores encrypted key at ~/.p8/keys/private_key_encrypted
# Stores public key at ~/.p8/keys/public_key

Initialize database with encryption:

rem init --password "strong_master_password"
# All entity data encrypted before storage
# Transparent encryption/decryption on get/put

Sharing across tenants (future):
- Encrypt data with recipient's public key (X25519 ECDH)
- End-to-end encryption - even database admin cannot read shared data
Device-to-device sync (future):
- WAL entries encrypted before gRPC transmission
- Defense in depth: mTLS (transport) + encrypted WAL (application layer)

Key security properties:

Private key never leaves device unencrypted
Password-derived key using Argon2 KDF
ChaCha20-Poly1305 AEAD for data encryption
Public key stored unencrypted for sharing capabilities

See docs/encryption-architecture.md for complete design.

Column Families (Performance)

CF	Purpose	Speedup vs Scan
`key_index`	Reverse key lookup	O(log n) vs O(n)
`edges` + `edges_reverse`	Bidirectional graph	20x faster
`embeddings` (binary)	Vector storage	3x compression
`indexes`	Indexed fields	10-50x faster
`keys`	Encrypted tenant keys	-

HNSW Vector Index

Rust HNSW index provides 200x speedup over naive Python scan:

Python naive: ~1000ms for 1M documents
Rust HNSW: ~5ms for 1M documents

This is the primary reason for Rust implementation.

Performance Targets

Operation	Target	Why Rust?
Insert (no embedding)	< 1ms	RocksDB + zero-copy
Insert (with embedding)	< 50ms	Network-bound (OpenAI)
Get by ID	< 0.1ms	Single RocksDB get
Vector search (1M docs)	< 5ms	HNSW (vs 1000ms naive)
SQL query (indexed)	< 10ms	Native execution (vs 50ms Python)
Graph traversal (3 hops)	< 5ms	Bidirectional CF (vs 100ms scan)
Batch insert (1000 docs)	< 500ms	Batched embeddings
Parquet export (100k rows)	< 2s	Parallel encoding

NB: WE generally work in batches; batch upserts and batch embeddings. NEVER make individual requests when batches are possible.

Environment Configuration

# Core
export P8_HOME=~/.p8
export P8_DB_PATH=$P8_HOME/db

# Embeddings
export P8_DEFAULT_EMBEDDING=local:all-MiniLM-L6-v2
export P8_OPENAI_API_KEY=sk-...  # For OpenAI embeddings

# LLM (natural language queries)
export P8_DEFAULT_LLM=gpt-4.1
export P8_OPENAI_API_KEY=sk-...

# RocksDB tuning
export P8_ROCKSDB_WRITE_BUFFER_SIZE=67108864  # 64MB
export P8_ROCKSDB_MAX_BACKGROUND_JOBS=4
export P8_ROCKSDB_COMPRESSION=lz4

# Replication
export P8_REPLICATION_MODE=primary  # or replica
export P8_PRIMARY_HOST=localhost:50051  # For replicas
export P8_WAL_ENABLED=true

See CLAUDE.md for full list.

Project Structure

percolate-rocks/          # Clean implementation
├── Cargo.toml            # Rust dependencies
├── pyproject.toml        # Python package (maturin)
├── README.md             # This file
├── CLAUDE.md             # Implementation guide
│
├── src/                  # Rust implementation (~3000 lines target)
│   ├── lib.rs            # PyO3 module definition (30 lines)
│   │
│   ├── types/            # Core data types (120 lines)
│   │   ├── mod.rs        # Re-exports
│   │   ├── entity.rs     # Entity, Edge structs
│   │   ├── error.rs      # Error types (thiserror)
│   │   └── result.rs     # Type aliases
│   │
│   ├── storage/          # RocksDB wrapper (400 lines)
│   │   ├── mod.rs        # Re-exports
│   │   ├── db.rs         # Storage struct + open
│   │   ├── keys.rs       # Key encoding functions
│   │   ├── batch.rs      # Batch writer
│   │   ├── iterator.rs   # Prefix iterator
│   │   └── column_families.rs  # CF constants + setup
│   │
│   ├── index/            # Indexing layer (310 lines)
│   │   ├── mod.rs        # Re-exports
│   │   ├── hnsw.rs       # HNSW vector index
│   │   ├── fields.rs     # Indexed fields
│   │   └── keys.rs       # Key index (reverse lookup)
│   │
│   ├── query/            # Query execution (260 lines)
│   │   ├── mod.rs        # Re-exports
│   │   ├── parser.rs     # SQL parser
│   │   ├── executor.rs   # Query executor
│   │   ├── predicates.rs # Predicate evaluation
│   │   └── planner.rs    # Query planner
│   │
│   ├── embeddings/       # Embedding providers (200 lines)
│   │   ├── mod.rs        # Re-exports
│   │   ├── provider.rs   # Provider trait + factory
│   │   ├── local.rs      # Local models (fastembed)
│   │   ├── openai.rs     # OpenAI API client
│   │   └── batch.rs      # Batch embedding operations
│   │
│   ├── schema/           # Schema validation (160 lines)
│   │   ├── mod.rs        # Re-exports
│   │   ├── registry.rs   # Schema registry
│   │   ├── validator.rs  # JSON Schema validation
│   │   └── pydantic.rs   # Pydantic json_schema_extra parser
│   │
│   ├── graph/            # Graph operations (130 lines)
│   │   ├── mod.rs        # Re-exports
│   │   ├── edges.rs      # Edge CRUD
│   │   └── traversal.rs  # BFS/DFS traversal
│   │
│   ├── replication/      # Replication engine (400 lines)
│   │   ├── mod.rs        # Re-exports
│   │   ├── wal.rs        # Write-ahead log
│   │   ├── primary.rs    # Primary node (gRPC server)
│   │   ├── replica.rs    # Replica node (gRPC client)
│   │   ├── protocol.rs   # gRPC protocol definitions
│   │   └── sync.rs       # Sync state machine
│   │
│   ├── export/           # Export formats (200 lines)
│   │   ├── mod.rs        # Re-exports
│   │   ├── parquet.rs    # Parquet writer
│   │   ├── csv.rs        # CSV writer
│   │   └── jsonl.rs      # JSONL writer
│   │
│   ├── ingest/           # Document ingestion (180 lines)
│   │   ├── mod.rs        # Re-exports
│   │   ├── chunker.rs    # Document chunking
│   │   ├── pdf.rs        # PDF parser
│   │   └── text.rs       # Text chunking
│   │
│   ├── llm/              # LLM query builder (150 lines)
│   │   ├── mod.rs        # Re-exports
│   │   ├── query_builder.rs  # Natural language → SQL
│   │   └── planner.rs    # Query plan generation
│   │
│   └── bindings/         # PyO3 Python bindings (300 lines)
│       ├── mod.rs        # Re-exports
│       ├── database.rs   # Database wrapper (main API)
│       ├── types.rs      # Type conversions (Python ↔ Rust)
│       ├── errors.rs     # Error conversions
│       └── async_ops.rs  # Async operation wrappers
│
├── python/               # Python package (~800 lines target)
│   └── rem_db/
│       ├── __init__.py   # Public API (thin wrapper over Rust)
│       ├── cli.py        # Typer CLI (delegates to Rust)
│       ├── models.py     # Built-in Pydantic schemas
│       └── async_api.py  # Async wrapper utilities
│
└── tests/
    ├── rust/             # Rust integration tests
    │   ├── test_crud.rs
    │   ├── test_search.rs
    │   ├── test_graph.rs
    │   ├── test_replication.rs
    │   └── test_export.rs
    │
    └── python/           # Python integration tests
        ├── test_api.py
        ├── test_cli.py
        ├── test_async.py
        └── test_end_to_end.py

Key Design Notes:

Rust Core (~3000 lines in ~40 files): All performance-critical operations in Rust
- Average 75 lines per file
- Max 150 lines per file
- Single responsibility per module
Python Bindings (bindings/): Thin PyO3 layer
- Database wrapper exposes high-level API
- Type conversions between Python dict/list ↔ Rust structs
- Error conversions for Python exceptions
- Async operation wrappers (tokio → asyncio)
- No business logic - pure translation layer
Python Package (python/): Minimal orchestration
- CLI delegates to Rust immediately
- Public API is thin wrapper (db._rust_insert())
- Pydantic models define schemas, Rust validates/stores
- Async utilities for Python async/await ergonomics
Replication Module: Primary/replica peer replication
- WAL (write-ahead log) for durability
- gRPC streaming for real-time sync
- Automatic catchup after disconnection
- Read-only replica mode
Export Module: Analytics-friendly formats
- Parquet with ZSTD compression
- CSV for spreadsheets
- JSONL for streaming/batch processing
LLM Module: Natural language query interface
- Convert questions → SQL/SEARCH queries
- Query plan generation (--plan flag)
- Confidence scoring
Test Organization: Separation of unit and integration tests

Rust Tests:
- Unit tests: Inline with implementation using #[cfg(test)] modules
```
// src/storage/keys.rs
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_encode_entity_key() {
        let key = encode_entity_key(uuid);
        assert!(key.starts_with(b"entity:"));
    }
}
```
- Integration tests: In tests/rust/ directory
  - Test full workflows across modules
  - Require actual RocksDB instance
  - May be slower (acceptable up to 10s per test)
Python Tests:
- Unit tests: NOT APPLICABLE (Python layer is thin wrapper)
- Integration tests: In tests/python/ directory
  - Test PyO3 bindings (Python ↔ Rust type conversions)
  - Test CLI commands end-to-end
  - Test async/await ergonomics
  - Require Rust library to be built
Running Tests:
```
# Rust unit tests (fast, inline with code)
cargo test --lib

# Rust integration tests (slower, requires RocksDB)
cargo test --test '*'

# Python integration tests (requires maturin build)
maturin develop
pytest tests/python/

# All tests
cargo test && pytest tests/python/
```
Coverage Targets:
- Rust: 80%+ coverage (critical path)
- Python: 90%+ coverage (thin wrapper, easy to test)

Development

Pre-Build Checks

# Check compilation (fast, no binary output)
cargo check

# Format check (without modifying files)
cargo fmt --check

# Linting with clippy
cargo clippy --all-targets --all-features

# Security audit (requires: cargo install cargo-audit)
cargo audit

# Check for outdated dependencies (requires: cargo install cargo-outdated)
cargo outdated

Building

# Development build (unoptimized, fast compile)
cargo build

# Release build (optimized, slower compile)
cargo build --release

# Python extension development install (editable)
maturin develop

# Python extension release wheel
maturin build --release

Testing

# Rust unit tests
cargo test

# Rust unit tests with output
cargo test -- --nocapture

# Python integration tests (requires maturin develop first)
pytest

# Python tests with verbose output
pytest -v

# Run specific test
cargo test test_name

Code Quality

# Auto-format code
cargo fmt

# Run clippy linter
cargo clippy --all-targets

# Fix clippy warnings automatically (where possible)
cargo clippy --fix

# Check for unused dependencies
cargo machete  # requires: cargo install cargo-machete

Benchmarks

# Run all benchmarks
cargo bench

# Run specific benchmark
cargo bench vector_search

Development Workflow

# 1. Make changes to Rust code
# 2. Check compilation
cargo check

# 3. Run tests
cargo test

# 4. Build Python extension
maturin develop

# 5. Test Python integration
pytest

References

Specification: See db-specification-v0.md in -ref folder
Python spike: ../rem-db (100% features, production-ready)
Old Rust spike: ../percolate-rocks-ref (~40% features)
Implementation guide: CLAUDE.md

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.2

Oct 26, 2025

0.3.1

Oct 26, 2025

0.3.0

Oct 26, 2025

0.2.4

Oct 26, 2025

0.2.3

Oct 25, 2025

0.2.0

Oct 25, 2025

0.1.0

Oct 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

percolate_rocks-0.3.2-cp38-abi3-manylinux_2_39_x86_64.whl (14.3 MB view details)

Uploaded Oct 26, 2025 CPython 3.8+manylinux: glibc 2.39+ x86-64

percolate_rocks-0.3.2-cp38-abi3-macosx_11_0_arm64.whl (10.2 MB view details)

Uploaded Oct 26, 2025 CPython 3.8+macOS 11.0+ ARM64

File details

Details for the file percolate_rocks-0.3.2-cp38-abi3-manylinux_2_39_x86_64.whl.

File metadata

Download URL: percolate_rocks-0.3.2-cp38-abi3-manylinux_2_39_x86_64.whl
Upload date: Oct 26, 2025
Size: 14.3 MB
Tags: CPython 3.8+, manylinux: glibc 2.39+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.6

File hashes

Hashes for percolate_rocks-0.3.2-cp38-abi3-manylinux_2_39_x86_64.whl
Algorithm	Hash digest
SHA256	`1b8b9da2a1b684cf276e47d2e246c506adfaba2467d4c3f91ac13da20d59557e`
MD5	`e5e023e3c0230ba95a14751220de5d2d`
BLAKE2b-256	`6cf14a1dc34470e2afafa37520774d49f2a6f159611b283d917d75f2382c3001`

See more details on using hashes here.

File details

Details for the file percolate_rocks-0.3.2-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: percolate_rocks-0.3.2-cp38-abi3-macosx_11_0_arm64.whl
Upload date: Oct 26, 2025
Size: 10.2 MB
Tags: CPython 3.8+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.6

File hashes

Hashes for percolate_rocks-0.3.2-cp38-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`88da89da34ea2e2157171729f83a8a6a6edcc2871fc353d15ec2915bd7f3090e`
MD5	`b2a202117d1e6f68e8a7ded435fb5fba`
BLAKE2b-256	`3db4193dac10c6d90665c97cc14445223c674cafa79aff65416e1ab584342f44`

See more details on using hashes here.

percolate-rocks 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

REM Database

Project Goals

Quick Start

Installation

Building

Testing cross-compilation locally

Basic Workflow

CLI Commands

Setup and Schema Management

Data Operations

Search and Queries

Export and Analytics

REM Dreaming (Background Intelligence)

Core System Schemas

Sessions

Messages

Moments

Peer Replication Testing

Terminal 1: Primary Node

Terminal 2: Replica 1

Terminal 3: Replica 2

Testing Failover

Key Implementation Conventions

REM Principle

Pydantic-Driven Everything

Deterministic UUIDs (Idempotent Inserts)

System Fields (Always Auto-Added)

Embedding Fields (Conditionally Added)

Encryption at Rest

Column Families (Performance)

HNSW Vector Index

Performance Targets

Environment Configuration

Project Structure

Development

Pre-Build Checks

Building

Testing

Code Quality

Benchmarks

Development Workflow

References

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes