A blazingly fast RDF-Star database powered by Polars

These details have not been verified by PyPI

Project links

Project description

RDF-StarBase

A blazingly fast RDF★ database with native provenance tracking

RDF-StarBase is a native RDF★ platform for storing, querying, and visualizing assertions about data — not just data itself. Every triple carries full provenance: who said it, when, how confident they were, and which process generated it.

Key Features

Blazingly Fast — Built on Polars with Rust-speed DataFrame operations
Native RDF-Star — First-class support for quoted triples and statement metadata
Full Provenance — Every assertion tracked with source, timestamp, confidence, process
Competing Claims — See ALL assertions, not just the "winning" one
SPARQL-Star — Query with standard SPARQL syntax + provenance extensions
Assertion Registry — Track data sources, APIs, and mappings as first-class entities
REST API — FastAPI-powered web interface with interactive docs
Graph Visualization — React + D3.js frontend for exploring knowledge graphs
Parquet Persistence — Efficient columnar storage for analytics workloads

Why RDF-StarBase?

Traditional databases store values.
Traditional catalogs store descriptions.
RDF-StarBase stores assertions about reality.

When your CRM says customer.age = 34 and your Data Lake says customer.age = 36, most systems silently overwrite. RDF-StarBase keeps both, letting you:

See competing claims side-by-side
Filter by source, confidence, or recency
Maintain full audit trails
Let downstream systems choose which to trust

Installation

pip install rdf-starbase

Or install from source:

git clone https://github.com/ontus/rdf-starbase.git
cd rdf-starbase
pip install -e ".[dev]"

Quick Start

from rdf_starbase import TripleStore, ProvenanceContext

# Create a store
store = TripleStore()

# Add triples with provenance
prov = ProvenanceContext(
    source="CRM_System",
    confidence=0.85,
    process="api_sync"
)

store.add_triple(
    "http://example.org/customer/123",
    "http://xmlns.com/foaf/0.1/name",
    "Alice Johnson",
    prov
)

# Query with provenance filtering
results = store.get_triples(
    subject="http://example.org/customer/123",
    min_confidence=0.8
)

# Detect competing claims
claims = store.get_competing_claims(
    subject="http://example.org/customer/123",
    predicate="http://example.org/age"
)

🔍 SPARQL-Star Queries

from rdf_starbase import execute_sparql

# Standard SPARQL
results = execute_sparql(store, """
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    SELECT ?name WHERE {
        <http://example.org/customer/123> foaf:name ?name
    }
""")

# With provenance extensions
results = execute_sparql(store, """
    SELECT ?s ?p ?o WHERE {
        ?s ?p ?o .
        FILTER_CONFIDENCE(>= 0.9)
        FILTER_SOURCE("CRM_System")
    }
""")

# ASK queries
exists = execute_sparql(store, """
    ASK WHERE {
        <http://example.org/customer/123> <http://xmlns.com/foaf/0.1/name> ?name
    }
""")  # Returns: True

Advanced Query Features

# OPTIONAL - include data when available
results = execute_sparql(store, """
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    SELECT ?person ?name ?email WHERE {
        ?person foaf:name ?name .
        OPTIONAL { ?person foaf:mbox ?email }
    }
""")

# UNION - combine multiple patterns
results = execute_sparql(store, """
    SELECT ?entity ?label WHERE {
        { ?entity rdfs:label ?label }
        UNION
        { ?entity foaf:name ?label }
    }
""")

# BIND - computed values
results = execute_sparql(store, """
    SELECT ?product ?price ?taxed WHERE {
        ?product ex:price ?price .
        BIND(?price * 1.1 AS ?taxed)
    }
""")

# Aggregates with GROUP BY
results = execute_sparql(store, """
    SELECT ?source (COUNT(*) AS ?count) (AVG(?confidence) AS ?avg_conf) WHERE {
        ?s ?p ?o .
    }
    GROUP BY ?source
    HAVING (COUNT(*) > 10)
""")

# CONSTRUCT - generate new triples
results = execute_sparql(store, """
    CONSTRUCT {
        ?person foaf:knows ?other .
    }
    WHERE {
        ?person ex:worksAt ?company .
        ?other ex:worksAt ?company .
        FILTER(?person != ?other)
    }
""")

# INSERT DATA - add new triples
execute_sparql(store, """
    INSERT DATA {
        <http://example.org/alice> foaf:name "Alice" .
        <http://example.org/alice> foaf:age 30 .
    }
""")

# DELETE DATA - remove specific triples
execute_sparql(store, """
    DELETE DATA {
        <http://example.org/alice> foaf:age 30 .
    }
""")

# DELETE WHERE - remove matching patterns
execute_sparql(store, """
    DELETE WHERE {
        <http://example.org/alice> foaf:knows ?anyone .
    }
""")

# DELETE/INSERT WHERE - update values atomically
execute_sparql(store, """
    DELETE { ?s ex:status "active" }
    INSERT { ?s ex:status "archived" }
    WHERE { ?s ex:status "active" }
""")

# Property paths - navigate graph relationships
results = execute_sparql(store, """
    SELECT ?ancestor WHERE {
        <http://example.org/alice> foaf:knows+ ?ancestor .  # One or more hops
    }
""")

results = execute_sparql(store, """
    SELECT ?connected WHERE {
        <http://example.org/alice> (foaf:knows|foaf:worksWith)* ?connected .  # Zero or more via knows OR worksWith
    }
""")

results = execute_sparql(store, """
    SELECT ?knower WHERE {
        ?knower ^foaf:knows <http://example.org/bob> .  # Inverse: who knows Bob?
    }
""")

# Time-travel queries - query historical state
results = execute_sparql(store, """
    SELECT ?s ?name WHERE {
        ?s foaf:name ?name .
    }
    AS OF "2025-01-15T00:00:00Z"
""")

# ASK with time-travel
existed = execute_sparql(store, """
    ASK WHERE {
        <http://example.org/alice> foaf:name ?name .
    }
    AS OF "2024-06-01"
""")  # Returns: True if Alice existed on that date

📊 Named Graph Management

RDF-StarBase supports named graphs (graph containers/clusters) with full SPARQL Graph Store Protocol operations:

from rdf_starbase import execute_sparql

# CREATE GRAPH - create a new named graph
execute_sparql(store, """
    CREATE GRAPH <http://example.org/graphs/customers>
""")

# LOAD - load RDF data from a file into a graph
execute_sparql(store, """
    LOAD <file:///data/customers.ttl> 
    INTO GRAPH <http://example.org/graphs/customers>
""")

# Or load from HTTP
execute_sparql(store, """
    LOAD <https://example.org/data/products.ttl>
    INTO GRAPH <http://example.org/graphs/products>
""")

# COPY - copy all triples from one graph to another
execute_sparql(store, """
    COPY GRAPH <http://example.org/graphs/customers>
    TO GRAPH <http://example.org/graphs/customers_backup>
""")

# MOVE - move triples (copy then clear source)
execute_sparql(store, """
    MOVE GRAPH <http://example.org/graphs/staging>
    TO GRAPH <http://example.org/graphs/production>
""")

# ADD - add triples to another graph (merge)
execute_sparql(store, """
    ADD GRAPH <http://example.org/graphs/updates>
    TO GRAPH <http://example.org/graphs/main>
""")

# CLEAR - remove all triples from a graph (graph still exists)
execute_sparql(store, """
    CLEAR GRAPH <http://example.org/graphs/temp>
""")

# DROP - delete a graph and all its triples
execute_sparql(store, """
    DROP GRAPH <http://example.org/graphs/old_data>
""")

# Special graph targets
execute_sparql(store, "CLEAR DEFAULT")  # Clear default graph
execute_sparql(store, "DROP NAMED")     # Drop all named graphs
execute_sparql(store, "CLEAR ALL")      # Clear everything

# SILENT mode - don't fail if graph doesn't exist
execute_sparql(store, """
    DROP SILENT GRAPH <http://example.org/graphs/maybe_exists>
""")

# List all named graphs
graphs = store.list_graphs()
print(graphs)  # ['http://example.org/graphs/customers', 'http://example.org/graphs/products']

Querying Named Graphs

# FROM clause - restrict query to specific graph
results = execute_sparql(store, """
    SELECT ?customer ?name
    FROM <http://example.org/graphs/customers>
    WHERE {
        ?customer foaf:name ?name
    }
""")

# FROM with multiple graphs (union of datasets)
results = execute_sparql(store, """
    SELECT ?entity ?label
    FROM <http://example.org/graphs/customers>
    FROM <http://example.org/graphs/products>
    WHERE {
        ?entity rdfs:label ?label
    }
""")

# GRAPH pattern - query specific named graph in WHERE clause
results = execute_sparql(store, """
    SELECT ?customer ?name WHERE {
        GRAPH <http://example.org/graphs/customers> {
            ?customer foaf:name ?name
        }
    }
""")

# GRAPH with variable - discover which graph contains data
results = execute_sparql(store, """
    SELECT ?graph ?entity ?name WHERE {
        GRAPH ?graph {
            ?entity foaf:name ?name
        }
    }
""")

# Combined patterns - default graph + specific named graph
results = execute_sparql(store, """
    SELECT ?person ?friend ?friendName WHERE {
        ?person foaf:knows ?friend .
        GRAPH <http://example.org/graphs/profiles> {
            ?friend foaf:name ?friendName
        }
    }
""")

# FROM NAMED - specify available named graphs for GRAPH patterns
results = execute_sparql(store, """
    SELECT ?g ?s ?name
    FROM NAMED <http://example.org/graphs/customers>
    FROM NAMED <http://example.org/graphs/employees>
    WHERE {
        GRAPH ?g { ?s foaf:name ?name }
    }
""")

⭐ RDF-Star: Quoted Triples

RDF-Star allows you to make statements about statements:

# The assertion "Alice knows Bob" is claimed by Wikipedia
store.add_quoted_triple(
    subject="<<http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/bob>>",
    predicate="http://example.org/assertedBy",
    obj="http://dbpedia.org/resource/Wikipedia",
    provenance=prov
)

Query with SPARQL-Star:

SELECT ?who WHERE {
    << ?person foaf:knows ?other >> ex:assertedBy ?who
}

Competing Claims Detection

# Multiple systems report different ages
crm_prov = ProvenanceContext(source="CRM", confidence=0.85)
lake_prov = ProvenanceContext(source="DataLake", confidence=0.92)

store.add_triple(customer, "http://example.org/age", 34, crm_prov)
store.add_triple(customer, "http://example.org/age", 36, lake_prov)

# See all competing values
claims = store.get_competing_claims(customer, "http://example.org/age")
print(claims)
# shape: (2, 4)
# ┌────────┬──────────┬────────────┬─────────────────────┐
# │ object │ source   │ confidence │ timestamp           │
# ├────────┼──────────┼────────────┼─────────────────────┤
# │ 36     │ DataLake │ 0.92       │ 2026-01-16 03:00:00 │
# │ 34     │ CRM      │ 0.85       │ 2026-01-16 02:00:00 │
# └────────┴──────────┴────────────┴─────────────────────┘

Persistence

# Save to Parquet (columnar, fast, compressible)
store.save("knowledge_graph.parquet")

# Load back
loaded_store = TripleStore.load("knowledge_graph.parquet")

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         RDF-StarBase                                │
├─────────────────────────────────────────────────────────────────────┤
│    React + D3.js Frontend    │     REST API (FastAPI)               │
├──────────────────────────────┼──────────────────────────────────────┤
│  SPARQL-Star Parser  │  Query Executor  │  Assertion Registry       │
├─────────────────────────────────────────────────────────────────────┤
│                    Triple Store (Polars DataFrames)                 │
├─────────────────────────────────────────────────────────────────────┤
│  Parquet I/O  │  Provenance Tracking  │  Competing Claims Detection │
└─────────────────────────────────────────────────────────────────────┘

Core Stack:

Polars — Rust-powered DataFrames for blazing performance
FastAPI — Modern async REST API framework
pyparsing — SPARQL-Star parser
Pydantic — Data model validation
D3.js — Graph visualization
PyArrow — Parquet persistence

Performance

RDF-StarBase leverages Polars' Rust backend for:

Vectorized operations on millions of triples
Lazy evaluation for query optimization
Zero-copy reads from Parquet
Parallel execution across cores

Web API

Start the server:

# Using uvicorn directly
uvicorn rdf_starbase.web:app --reload

# Or with the module
python -m rdf_starbase.web

Then open:

API Docs: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

REST Endpoints

Endpoint	Method	Description
`/triples`	GET	Query triples with filters
`/triples`	POST	Add new triple with provenance
`/triples/{subject}/claims`	GET	Get competing claims
`/sparql`	POST	Execute SPARQL-Star query
`/sources`	GET/POST	Manage data sources
`/graph/nodes`	GET	Visualization data
`/graph/edges`	GET	Graph edges
`/stats`	GET	Database statistics

🤖 AI Grounding API

A specialized API layer designed for AI/LLM consumption, separate from the UI visualization endpoints:

Endpoint	Method	Description
`/ai/query`	POST	Structured fact retrieval with provenance for RAG
`/ai/verify`	POST	Verify if a claim is supported by the knowledge base
`/ai/context/{iri}`	GET	Get all facts about an entity with citations
`/ai/materialize`	POST	Trigger reasoning and persist inferences
`/ai/inferences`	GET	List materialized inferences
`/ai/health`	GET	AI API health check

Why a Separate AI API?

Aspect	UI API (`/graph/*`)	AI Grounding API (`/ai/*`)
Consumer	D3.js visualization	LLM tool calls / agents
Response format	Nodes + edges for rendering	Facts + provenance + citations
Query pattern	Browsing, neighborhood exploration	Precise fact lookup, verification
Filtering	Limit by count, visual simplicity	Confidence threshold, freshness

Example: Grounding an AI Response

import httpx

# 1. Query relevant facts for RAG
response = httpx.post("http://localhost:8000/ai/query", json={
    "subject": "http://example.org/customer/123",
    "min_confidence": "high",  # high (>=0.9), medium (>=0.7), low (>=0.5), any
    "max_age_days": 30,        # Only recent facts
})
facts = response.json()["facts"]

# 2. Verify a claim before stating it
verify = httpx.post("http://localhost:8000/ai/verify", json={
    "subject": "http://example.org/customer/123",
    "predicate": "http://xmlns.com/foaf/0.1/age",
    "expected_object": "34",
})
result = verify.json()
if result["claim_supported"]:
    print(f"Claim verified with {result['confidence']:.0%} confidence")
elif result["has_conflicts"]:
    print("Warning: Competing claims exist!")
    print(result["recommendation"])

# 3. Get full entity context
context = httpx.get("http://localhost:8000/ai/context/http://example.org/customer/123")
entity_facts = context.json()["facts"]
related = context.json()["related_entities"]

Inference Materialization

Materialize RDFS/OWL inferences with provenance tracking:

# Run reasoning engine and persist inferred triples
response = httpx.post("http://localhost:8000/ai/materialize", json={
    "enable_rdfs": True,   # RDFS entailment rules
    "enable_owl": True,    # OWL 2 RL rules
    "max_iterations": 100,
})
print(f"Inferred {response.json()['triples_inferred']} triples")

# Query inferred facts (marked with source='reasoner')
inferences = httpx.get("http://localhost:8000/ai/inferences")
for fact in inferences.json()["inferences"]:
    print(f"Inferred: {fact['subject']} {fact['predicate']} {fact['object']}")

📋 Assertion Registry

Track data sources as first-class entities:

from rdf_starbase import AssertionRegistry, SourceType

registry = AssertionRegistry()

# Register a data source
source = registry.register_source(
    name="CRM_Production",
    source_type=SourceType.API,
    uri="https://api.crm.example.com/v2",
    owner="sales-team",
    tags=["production", "customer-data"],
)

# Track sync runs
run = registry.start_sync(source.id)
# ... perform sync ...
registry.complete_sync(run.id, records_processed=1000)

# Get sync history
history = registry.get_sync_history(source.id)

🧪 Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=src/rdf_starbase

# Format code
black src/ tests/
ruff check src/ tests/

📊 Frontend (React + D3)

cd frontend
npm install
npm run dev

Then open http://localhost:3000 (proxies API to :8000)

📚 Examples

See the examples/ directory:

quickstart.py — Core features demonstration
competing_claims.py — Handling conflicting data from multiple sources
sparql_queries.py — SPARQL-Star query examples
registry_demo.py — Assertion Registry usage

🗺️ Roadmap

✅ Completed (MVP)

Native RDF-Star storage
Provenance tracking (source, timestamp, confidence, process)
Competing claims detection
SPARQL-Star parser (SELECT, ASK, FILTER, ORDER BY, LIMIT, OFFSET)
SPARQL-Star executor with Polars backend
Provenance filter extensions
Parquet persistence
Assertion Registry (datasets, APIs, mappings)
REST API with FastAPI
React + D3 graph visualization

✅ Completed (Advanced Query Features)

OPTIONAL patterns (left outer joins)
UNION patterns (combine result sets)
MINUS patterns (set difference)
FILTER expressions (comparisons, boolean logic, regex, string functions)
BIND clauses (variable assignment, expressions, functions)
VALUES inline data
Aggregate functions (COUNT, SUM, AVG, MIN, MAX, GROUP_CONCAT, SAMPLE)
GROUP BY and HAVING
CONSTRUCT queries (template-based triple generation)
DESCRIBE queries (resource description)
SPARQL UPDATE (INSERT DATA, DELETE DATA, DELETE WHERE, DELETE/INSERT WHERE)
OWL reasoning (rdfs:subClassOf, owl:sameAs, owl:inverseOf, owl:TransitiveProperty)
Property path queries (/, |, ^, *, +, ?)
Time-travel queries (AS OF "2025-01-15T00:00:00Z")
AI Grounding API (/ai/query, /ai/verify, /ai/context)
Inference materialization (/ai/materialize, /ai/inferences)
Named Graph Management (CREATE, DROP, CLEAR, LOAD, COPY, MOVE, ADD)
FROM clause dataset specification
GRAPH pattern queries

🔜 Next

Trust scoring and decay

🚀 Future

Federation across instances
Governance workflows

📄 License

MIT License — see LICENSE for details.

🙏 Acknowledgments

Polars — The lightning-fast DataFrame library
RDF-Star Working Group — For the specification
FastAPI — Modern Python web framework
D3.js — Data visualization library
pyparsing — Parser combinators for Python

RDF-StarBase — The place where enterprises store beliefs, not just data.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Jan 20, 2026

This version

0.1.0

Jan 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdf_starbase-0.1.0.tar.gz (254.9 kB view details)

Uploaded Jan 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rdf_starbase-0.1.0-py3-none-any.whl (154.3 kB view details)

Uploaded Jan 19, 2026 Python 3

File details

Details for the file rdf_starbase-0.1.0.tar.gz.

File metadata

Download URL: rdf_starbase-0.1.0.tar.gz
Upload date: Jan 19, 2026
Size: 254.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for rdf_starbase-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`1a5f5048255fade2079f521ad4198b3009d6f9f027d8e39f0683bd889028292a`
MD5	`e597ec1ef6a510d52c0a3256f7edfd7b`
BLAKE2b-256	`438aa45b0f721c8ddb1a64677341d74c1cd602eab37199b00006e546c477dc31`

See more details on using hashes here.

File details

Details for the file rdf_starbase-0.1.0-py3-none-any.whl.

File metadata

Download URL: rdf_starbase-0.1.0-py3-none-any.whl
Upload date: Jan 19, 2026
Size: 154.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for rdf_starbase-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`65ee177611c63acf52535a288a87407bfa6e57c85cf1d55433a8770e3a522fa8`
MD5	`cb676119ba17afce15dfe555e6234ff9`
BLAKE2b-256	`e9fcd0130881ed9554ba1287844cc750f61ff9fdb254fca500b23c4975d26f1f`

See more details on using hashes here.

rdf-starbase 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RDF-StarBase

Key Features

Why RDF-StarBase?

Installation

Quick Start

🔍 SPARQL-Star Queries

Advanced Query Features

📊 Named Graph Management

Querying Named Graphs

⭐ RDF-Star: Quoted Triples

Competing Claims Detection

Persistence

Architecture

Performance

Web API

REST Endpoints

🤖 AI Grounding API

Why a Separate AI API?

Example: Grounding an AI Response

Inference Materialization

📋 Assertion Registry

🧪 Development

📊 Frontend (React + D3)

📚 Examples

🗺️ Roadmap

✅ Completed (MVP)

✅ Completed (Advanced Query Features)

🔜 Next

🚀 Future

📄 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes